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AN EVALUATION OF A COLLEGE COURSE IN 
HUMAN RELATIONS* 


HAROLD KELLEY 


Yale University 
and 


ALBERT PEPITONE 


University of Pennsylvania 


The type of college course referred to by the term ‘human 
relations’ has proliferated in recent years as have those non- 
academic training workshops in which the participants learn 
leadership skills, group discussion methods, and a better under- 
standing of interpersonal relations.f It is probable that this 
development reflects a strong motivation among social scientists 
and educators to provide learning experiences that have prag- 
matic value for participants in improving their abilities to handle 
various personal and interpersonal problems. The general goal 
of the course evaluated in this report was to improve the partici- 
pants’ ‘human relations skill.’ Particular attention is directed 
to these two questions: What evaluation methodology can be 
employed to measure the success of this and similar courses? 
and, What particular kinds of change in the area of human 
relations skill can be brought about? 





* The course was offered in the Department of Economics and Social 
Science at the Massachusetts Institute of Technology. We are indebted to 
Professors D. MacGregor and I. Knickerbocker who developed the course, 
to Professor M. Haire who supervised the teaching of the various sections, 
and to Mrs. Betty Lopez who did a major share of the content analysis. 

+See for example L. Bradford, and J. R. P. French, (Eds.), ‘‘The 
Dynamics of the Discussion Group,” J. Soc. Issues, IV, 2; H. Thelen, 
“Educational Dynamics: Theory and Research,” J. Soc. Issues, VI, 2. 
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A DESCRIPTION OF THE COURSE 


The course made almost exclusive use of the discussion 
method. The content of the class discussions came from three 
interrelated sources: a) a comprehensive outline of psychological 
concepts and theories relevant to problems of human relations, * 
b) illustrations and applications of these concepts and theories 
presented by the students in the form of personal experiences, 
political and social ‘current events,’ anecdotes from movies, 
drama, literature, etc., c) the instructor’s definitions of concepts, 
elaboration of theories, and introduction of appropriate experi- 
mental evidence. There were no examinations. The student 
was graded principally on the basis of a ‘notebook,’ submitted 
three times during the semester, which was essentially a detailed 
application of the concepts and theories discussed in class. 
Readings were recommended but not formally assigned. 

The behavior of the various instructors was standardized as 
much as possible. Weekly meetings were held to ascertain the 
progress of each instructor on the course outline, to compare the 
effectiveness of various questions and issues in stimulating dis- 
cussion, to list the major points to be made, and to consider 
specific teaching techniques (e.g., the use of réle-playing). Also 
discussed were methods of producing and maintaining a class- 
room atmosphere of informality and friendliness. On the basis 
of decisions reached in these planning sessions, each instructor 
deliberately attempted to achieve an informal and ‘nondirective’ 
style of teaching. The following observations, obtained during 
nine fifty-minute class periods led by one instructor,f provide a 
picture of how the instructors actually behaved in the classroom. 

Of the total volume of participation during each period, the 
instructor accounted for fifty-four per cent while the twenty 
students (a typical size for the sections) accounted for the 
remaining forty-six per cent. The contributions to the discus- 
sion initiated by the instructor were distributed as follows: 





* E.g., the concepts of need, barrier, conflict of forces, social interdepend- 
ence, group standards, social influence, conformity, social organization, etc. 
+ These observations were obtained during the semester immediately 
following the one in which the present research was conducted. They 
probably represent a reasonably valid sample of the typical leadership style 


used in this course. 
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Per Cent 

acai as sins a aie as a a 17 
Setting limits to the discussion, ‘framing’ the topic, etc.......... 10 
Criticizing and putting students ‘on the spot’ by direct questions. 15 
Encouraging participation and group decision................... 26 
Neen ae elds 6 nine see et bok on 00d 13 
Promoting equality of status and expressing intimacy............ 19 
100 


The instructor’s responses to student contributions were dis- 
tributed as follows: 


Per Cent 

Answering questions in a matter of fact manner................. 18 
Rephrasing student questions..................ccccecsececcece 6 
Accepting student contributions.....................cceeeeeees 33 
Praising student contributions......................0c0ceeueee 29 
ee criss nee ye hank eeevendeduseeeden 14 
100 


It is clear that although the instructor plays a central part in 
the classroom process, his réle is not the usual one of presenting 
the course content, answering questions, and exerting tight con- 
trol over the direction of the discussion. Within the broad 
limits which he places upon the discussion, he provides great 
freedom and encourages the group to follow its main lines of 
interest. In his emotional relations with the group, for the most 
part he is friendly and a source of reward. Only a small part of 
his behavior implies rejection or negative evaluation of the 
student. 

As judged from their spontaneous comments the great majority 
of students found the course attractive. Several specific reasons 
can be identified: Some men felt the course was comfortable 
because it involved a minimum of work and no examinations. 
Many students felt more secure in this permissive classroom 
atmosphere than in the more formalized and intensive courses 
which occupied the major portion of their time in school. A few 
students reported that they were able to eliminate uncertainties 
connected with ‘forbidden subjects’ (e.g., sex, aggression, etc.). 
Finally, a considerable number said that they were able to express 
and pursue their strong interests in psychology. 
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EVALUATION OF COURSE 


The Subjects.—A total of one hundred forty-six men, mostly 
juniors, were involved in the study. The age range was eighteen 
to twenty-five, the average being around twenty-two. Most 
subjects were Veterans whose college training had been inter- 
rupted, but a few had entered college directly from high school. 
Virtually all subjects were majors in some branch of engineering 
or science. The human relations course represented one of three 
humanities courses of which one was required. 

The Design.—In order to determine the changes produced by 
the course, part of the students were ‘measured’ at the beginning, 
others at the midpoint, and the remainder at the end of the 
semester. The final measurements were taken three months 
after the initial ones. 

A test-retest design was ruled out because: (1) the content of 
the measuring device was of such a nature as to be vividly 
remembered from one testing to the next, and (2) it appeared to 
be unfeasible to construct alternate forms of the instrument 
without an elaborate, time-consuming pretest of them. The 
same measuring instruments were therefore applied to different 
samples of the population at the three testing points. These 
samples corresponded to the seven sections of the course being 
taught that semester. The three sections taught by the authors 
and the course supervisor (who was involved in planning the 
evaluation program) were measured at the beginning of the course, 
specifically, during the third and fourth meetings. This ruled 
out the possibility that these investigators, in their réles as 
instructors, could influence the ‘scores’ of their students by 
selective teaching. Of the other four sections, two were measured 
at the midpoint and two at the end. The assignment of these 
sections to the midpoint and end groups was arbitrary. Thus 
‘measured’ were sixty-seven students from three sections at the 
beginning (instructors P, K, and H), forty-one from two sections 
at the midpoint (instructors L and E), and thirty-eight from two 
sections at the end (instructors L and D).* The instructors 
representing the midpoint and end groups were not informed of 





* The N’s in our data will vary somewhat due to absences. The measure- 
ments at any given point were taken during two successive class periods. 
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the specific content of the measuring instruments and those who 
became aware of the content did not discuss it in class. 

In order to determine whether the students measured at the 
three points during the semester differed in ways other than in 
amount of experience in this particular course, a comparison of 
the samples was made with respect to age, course major, number 
of previous courses related to human relations, and nationality 
as inferred from their surnames. There were no significant 
differences among the samples on any of these variables and 
those variations which did appear were not consistent nor in a 
direction to explain differences obtained on the measuring instru- 
ments. There is some justification, therefore, in assuming no 
major differences among the three samples except in the amount 
of training in this course. 

Since the three groups also differ (over a range of three and 
one-half months) in general maturation, exposure to the college 
environment, and general social experience, there is a problem in 
being able univocally to attribute any change to the specific 
experience of this course. As will be seen, the changes that do 
appear are highly consistent with the content of this course and 
as such would not be likely to result from the influence of other 
general factors operating over the same time period. 

A further problem of interpretation arises from the fact that, 
with one exception, different instructors are represented at the 
various measurement points. The facts suggest that the changes 
reflect experience in a certain type of course rather than under 
an instructor with a particular personality. Specifically, no 
major differences appear among instructors L, E, and D in the 
nature of the changes that took place under their leadership. 
This was to be expected since, as indicated previously, there was 
considerable attention given to standardizing the crucial aspects 
of instructor behavior. 

Course Objectives.—The general objective of the course was to 
bring about in the students a deeper understanding of human 
relationships, and thereby, ultimately, to provide the means to 
actual behavior changes in the student’s relations with friends, 
roommates, parents, and others. In order to specify criteria by 
which the success of the course, i.e., in increasing the student’s 
insight into social problems, could be measured, an attempt was 
made to define the basic content of the course. Working from 
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the formal course outline of concepts and theories—on which 
much of the class discussion was based—the writers developed a 
list of ‘human relations principles.’ If, in their written analyses 
of human relations problems, the students showed concrete 
evidence of having ‘applied’ one or more of these principles, the 
course could be assumed to have accomplished its major objec- 
tive. The following list of generalizations represents the particu- 
lar content areas in which change can be expected. 

(1) All humans enter life with a basic set of needs and with 
certain potentialities for developing motor, perceptual, and 
intellectual mechanisms for satisfying these needs. 

(2) Humans also enter life with a considerable degree of 
ability to learn. 

(3) They maintain considerable potentiality for relearning and 
modification of behavior patterns throughout their lives. 

(4) The person is in constant interaction with his social and 
physical environment. Behavior is a function of the environ- 
ment as well as the person. Behavior can only be incompletely 
understood in terms of factors ‘inside’ the person. 

(5) As the person interacts with his environment, he learns 
ways of satisfying his needs. He develops behavior patterns, 
channels of expression, and secondary needs which may be quite 
specific to the norms of his particular culture or the demands of 
his physical environment. 

(6) Each person governs his behavior in terms of his own pri- 
vate views of the world—his perceptual field—which may 
correspond closely or little to the real world. 

(7) An important behavior situation is that of deprivation 
where the person is blocked in his attempts to satisfy his needs. 
A deprivation situation may be frustrating to the person and 
lead to aggression, withdrawal, etc. 

(8) Frequently a person’s various needs conflict with each 
other. Conflict may result in emotional symptoms. It is 
frequently resolved by compromise, alternation, withdrawal, and 
other important styles of behavior. 

(9) The informal group memberships of the person, social 
institutions, and ‘culture’ are important origins of his social 
attitudes and behavior. 

(10) Interpersonal relationships and social organization in 
general are characterized by interdependence. Interdependence 
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is the basic descriptive category of all social life and the funda- 
mental condition for social influence and change. 

The major hypothesis to be tested, then, is that the course will 
deepen the student’s understanding of human relations and that 
such a change will be reflected in the increased frequency with 
which the above principles are applied in the analyses of social 
problem situations. 

Most of these principles when interpreted literally possess an 
apparent relationship to certain social attitudes. For instance, 
a full understanding of the interaction between person and 
environment as the causal equation for all social behavior implies 
on the one hand attitudes that reject exclusively inner determi- 
nants of behavior such as ‘will,’ physiology, instinct; and, on the 
other hand, attitudes that men are what they are in part because 
of certain environmental circumstances, etc. Thus, if the learn- 
ing of these principles affects the student’s value systems, we 
might expect, as a secondary hypothesis, changes in social 
attitudes that are broadly congruent with the principles. 

Measurement Instruments.—Since the major hypothesis seeks 
evidence for increased application of human relations principles, 
an important requirement for the measurement instrument is 
that it provide a problem which can be solved by the application 
of these principles. Another requirement is that the problem 
content be sufficiently removed from the classroom discussion so 
that it is a novel problem and not one that can be solved by 
repeating what was ‘memorized’ in class. Thirdly, the problem 
should be interesting enough to motivate the best efforts of the 
subjects. Fourth, the problem solutions should be amenable 
to some form of quantitative analysis. Finally, to examine the 
possibility of attitudinal change, the content of the problem 
should allow for the expression of positive or negative attitudes. 
The following problem ‘tests’ were constructed with these 


criteria in mind. 


“ Management Techniques Problem”’ 


Recently the Managers Association of America had an annual 
convention in which the topic, ‘‘Techniques of Management,” 
was discussed from several points of view. At the end of a 
week-long session the principal speakers summarized the discus- 


sion as follows: 
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Manager A: “I believe that in order to keep up production a constant 
check must be kept on all employees. The only way to get conscientious 
performance is to expect and to secure discipline and immediate accept- 
ance of all orders. One should be careful not to spoil the employee 
with too much praise. Employees get paid to carry out directives, 
while managers get paid to make directives.”’ 

Manager B: “I believe that managers should be interested in the 
employees in the most personal way possible. He should urge the 
employees to bring their problems to him. He alone should decide 
when a job is well done and when it’s poorly done. It is only when the 
organization resembles a happy family group that you can expect high 
production.” 

Manager C: “‘I believe that the best guarantee of good production is 
to leave the employees alone as much as possible. It is necessary for 
managers to realize that able employees can work out most if not all 
problems they are confronted with. Aside from some initial training, 
the employee must be given complete freedom in performing his tasks.”’ 

Manager D: “I believe that the best method of management is to 
make it possible for the employee to take an active part in planning 
work assignments and making decisions related to his job. It is only 
when the employee understands the nature of his work and can partici- 
pate fully in solving related problems that you have good management.” 


x**x* kee KK KK KE HK K 


It should be clear that although the above viewpoints expressed by 
the four managers are not mutually exclusive, they do represent fairly 
distinguishable patterns of management techniques. 

Briefly sketch the most likely employee behavior resulting from the 
application of each of the above policies. Give reasons for each in 
terms of what you know about human behavior. Also indicate which 
you think is the most satisfactory management technique. 


“‘ Bursar Incident Problem” 


Henry Bursar had been plant manager of Eastern Division for several 
years. He knew his men intimately. Each morning he would greet 
them by saying “‘Good Morning, Joe” or ‘“‘Good Morning, Tom,” then 
he would inquire about their families, illnesses, birthdays, etc. He felt 
that every man’s business was his own. If Smith, for example, one of 
the drill press operators, turned out particularly bad work one morning, 
Bursar would know immediately that “‘wife trouble” was responsible 
in Smith’s case and would make several appropriate recommendations. 
Similarly, if Hudson in the shipping department seemed distracted, 
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Bursar would recognize that certain gambling debts were responsible. 
He would call Hudson into the office, first give him Hell, then clap him 
on the shoulder, push him out of the office, and holler, ‘‘ Next time 
behave yourself.”” Hank Bursar likes to think that he is developing a 
happy family group with himself as sort of a father. Having a natural 
distaste for formal organization, he personally directs all worker activi- 
ties. He arranges all plant social activities, he extends personal loans, 
he makes up Christmas Baskets, writes editorials in the company news- 
paper, and even makes weekly visits to the homes of many employees. 
In fact, most of the workers live in a housing project financed by Bursar. 

Morale was at a very high level until March of 1945. Then it all 
seemed to happen at once. Bursar, on his mid-morning inspection 
tour, found Peterson, Shaw, and Halloway joking in one corner of the 
machine shop. They saw him coming and stopped. Hank was angry 
and began lacing it in, but just as he was about to calm down as was 
customary, the group itself became unusually angry and yelled back at 
Bursar with extreme harshness. At the end of one minute, each of them 
in his own way had said, ‘‘I can’t stand any more of this, I quit” . . . 
This shocked Bursar, but thinking that it had happened because he had 
been particularly irritable that day, he leaned over backwards to become 
nice. Yet all that following week incidents occurred similar to the one 


just described. 
**x* &£e eee eee KE 


Briefly analyze this situation with respect to such questions as the 
following: Why did this situation develop? What features would have 
allowed you to predict it as it developed? What could have been done 
to prevent it? 

Don’t necessarily answer these questions in order and, in any case, 
do not let them restrict your interpretation. They are to be considered 
only as general guides. 


** Alcoholism Problem”’ 


D. T. is a heavy drinker and lately has gone on long binges with 
increasing frequency. He has been absent from work during these 
periods, but has given his boss no explanation in order not to reveal his 
weakness. As a consequence he is in danger of losing his job. On these 
‘week-ends’ he comes home after several days of absence, strongly 
berates his wife, and demands money from her to buy more liquor. 
During his last bout, she took their child and left their home to live 
temporarily with her folks. 

During his sober periods, D. T. has sought and received a lot of 
advice. This is what various people have told him: 
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M: “You must get hold of yourself and make up your mind to stop 
drinking. You can stop if you just set your mind to it and convince 
yourself that you can. What it takes is strong resolution and a firm 
will.” 

N: “I’m sure I can solve your problems. I’ll simply give you a drug 
which will make you nauseous at the odor of alcohol.” 

O: “Your symptom is the result of the interplay of many factors. 
Foremost among these is your relation to your wife. You depend upon 
her very much and yet you have a strong desire for freedom from her 
demands. Although you love her a great deal, your use of alcohol must 
be viewed as a way of defying her authority over your conduct. We'll 
have to work out your relation with her before going any further.”’ 

P: “Your case is very simple. As you’ve told me, you have a strong 
desire for excitement and you find alcohol to be an easy means to fulfill 
this desire. What you must do is to seek other methods of obtaining 
excitement. I’d suggest that you get out and do things which involve 
some risk such as flying, skiing, or sailing.”’ 

R: “You must face the fact that your father and his father before him 
were very heavy drinkers. They experienced exactly the same difficul- 
ties that you are having. I think your best chance to be cured is to 
submit to being placed in a rest home where you won’t be able to get 
liquor and to stay there until this period of your life has passed.”’ 


**e eX EX KKK KE K 


Write a brief evaluation of each piece of advice given to D. T. Do 
this in terms of what right or wrong assumptions were made, what 
correct or incorrect principles of behavior were applied and what possi- 
bilities or considerations were overlooked by the person giving the 
advice. Also indicate which recommendation you consider to be the 


best. 


The problems were introduced to the classes as part of a study 
being carried out by another department. The students were 
repeatedly assured that their analyses of these problem situations 
would not influence their course grade. Twenty minutes were 
given for the analysis of each problem. 

Content Analysis of the Protocols —Categories for the content 
analysis of the raw data were constructed by: 1) randomly select- 
ing sample protocols of each problem from the total population 
of respondents, 2) making a detailed list of all major interpreta- 
tions, arguments, etc. contained in the protocols, 3) defining 
general categories on the basis of both the concrete material and 
certain human relations principles mentioned above. Specifi- 
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cally, the third step involved combining the ‘empirical’ state- 
ments into more general categories defined in terms of one or 
more of the following principles: 


—person-environment interactional conception of causation, 

—the concept of perceptual field and its function in govern- 
ing behavior, 

—the concept of frustration and its theoretical consequences, 

—the concept of need conflict and its theoretical conse- 
quences. 


All of the protocols were then coded by an assistant without 
knowledge as to which measurement point was represented by 
any given case. When a difficult case made it necessary to alter 
the definition of a category, all protocols that had already been 
coded were re-examined and, if necessary, recoded. 


RESULTS 


Insight into Human Relations.—Two exhaustive categories 
were set up to describe the degree of human relations insight 
shown in the ‘‘Management Techniques Problem.” Taken 
as evidence of deeper understanding of human relations are those 
analyses in which the effects of various management techniques 
are derived from causal factors operating in the worker and his 
environment—that is, in terms of the interaction of specific 
worker motives and specific environmental variables. There is 
frequent reference to theories of frustration, conflict, and ego- 
need satisfaction. At the superficial level of analysis, the effects 
of management behavior tend to be seen as consequences of 
characterological and class attributes of the workers which are 
relatively independent of the specific social environment. There 
are fewer references to psychological concepts and theories. To 
illustrate the distinction between the two levels of analysis, 
excerpts from raw protocols are given below. 


Interactional-conceptual treatment of management-worker relations.— 
“Obviously, this manager is employing unreasonable reduction tech- 
niques on his workers. They will probably come to feel frustrated and 
will strongly resist the manager’s severity. They will undertake 
aggressive means to get back at the boss . . . only their (the workers’) 
active participation in planning can secure their codperation, not by 
frustrating their self-esteem and needs for independence.”’ 
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Superficial treatment of management-worker relations.—‘‘ This technique 
can be used successfully in large scale production-type businesses where 
employees do not expect personal treatment but only justice and when 
employees are of low mentality and so constituted, like Teutonic 
temperament, that they react well to rigid discipline.”’ 


A similar categorization was made for the “ Bursar Incident”’ 
problem. In the superior category are explanations of the 
‘incident’ formulated in terms of the interaction between the 
social environment created by Bursar and the needs, values, etc. 
of the workers in the plant. The causes of the ‘blow-up’ lay 
in the frustration of and threats to needs by environmental 
circumstances directly or indirectly attributable to Bursar’s 
behavior. Again the superficial students tended to diagnose 
the incident in terms of vague causal agents inherent in the 
workers or situation. The excerpts below illustrate the two 


levels. 


Interactional-conceptual treatment of the Bursar Incident.—‘‘ By 
attempting to dominate the workers’ lives completely, Bursar frustrated 
certain essential needs of his men including ‘egoistic’ self-expression and 
independence. . . . The workers were in a conflict situation—which 
gradually became unbearable—they resented Bursar’s domination but 
had to remain quiet in order to keep their jobs.” 


Superficial treatment of the Bursar Incident.—“ This situation is the 
result of the plant manager’s overextending himself to be one of the 
boys. To a certain extent, interest in the employees is helpful on the 
part of management but not when carried to the extremes carried 
here. . . . In essence, the manager has set himself too high in impor- 
tance to succeed with his men.”’ 


It was found possible to code another dimension of insight into 
this type of human relations problem; namely, the reasons for 
the manager’s excessively paternal behavior. Some students 
explored rather fully this side of the problem situation, while 
others did not touch upon it at all. A variety of theoretical 
explanations were advanced to account for Bursar’s general 
relations with his employees, and, particularly, his inability to 
understand the workers’ ultimate aggressive explosion. Such 
theories took one or more of the following forms: 

a) Indicates that Bursar’s nurturant behavior and creation of 
maximum employee dependence masks a strong dominance need 
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or a pervasive hostility expressed toward people in subordinate 
réles. 

b) Indicates that Bursar’s compulsive friendliness reflects a 
basic frustration and represents a compensation for deep-seated 
feelings of inadequacy or a means of alleviating guilt. 

c) Indicates that Bursar’s surprise or shock over the ‘incident’ 
is due to his seriously distorted perception of worker needs and 
ways of satisfying them. 

The students’ remarks concerning the efficacy of and rationale 
behind the several recommendations given to the alcoholic were 
interpreted as a whole and coded in terms of the conceptions of 
causality suggested or explicitly stated. Again two categories 
were employed. On the sophisticated level are those causal 
analyses of alcoholism which place primary emphasis on the 
interactions of personal factors with the environment. Excessive 
consumption of alcohol is often regarded as a means of substitute 
satisfaction of needs thwarted by certain environmental circum- 
stances, as a means of relieving tensions arising from person- 
situation conflicts, etc. On the other hand, primitive concep- 
tions imply or state unequivocally that alcoholism is chiefly the 
result of a single factor such as weak character or will, lack of 
moral fibre, or that it is almost exclusively caused by physiologi- 
cal habituation, a simple ‘need’ for excitement, etc. The 
excerpts below illustrate the two categories: 


Interactional-conceptual treatment of alcoholism.—‘‘ Undoubtedly DT 
has a deep seated conflict which he is trying to resolve through drinking. 
Perhaps he wishes to escape responsibility imposed by his family or 
work situation. . . . The drug, however, does not alleviate the problem 
—the cause of DT’s behavior. If drugged, DT would only proceed 
to find another means of relieving the tensions arising from basic 
frustrations.” 


Superficial treatment of alcoholism.‘ The suggestion of will power is 
good advice. DT’s problem is made simple. Prestige is offered if DT 
stops in that he will have shown strong resolution.” 

“By providing actual excitement in his life, DT will probably see no 
further need for the artificial stimulus provided by alcohol.”’ 


Table I summarizes the frequency with which the superior 
categories occur in responses to these three problems. It can 
be seen that the percentage of students falling at each of the 








206 The Journal of Educational Psychology 


three measurement periods rises consistently. Within the same 
period, of course, there is a systematic drop in the percentage of 
superficial treatments. In the Alcoholism problem, the percent- 
age of cases in which no causal interpretation is made also drops 
consistently. Chi-square tests indicate that all but one of the 
changes are statistically significant. It is thus justifiable to 
conclude that in terms of the frequency with which human 
relations principles were applied in the analysis of novel problem 
situations, the course produced a significant improvement in the 
students’ insight. 


TABLE I.—INSIGHT INTO HUMAN RELATIONS PROBLEMS 
Chi- 
Beginning Middle End Square P 
Management Techniques 
Problem (N = 66) (N = 42) (N = 41) 
Interactional-concep- 
tual treatment of 


manager-worker 
relations 8% 33% 47% 21.89 <.01 
(2 df) 
Bursar Incident Problem (N = 62) (N = 41) (N = 36) 
Interactional-concep- 
tual treatment of 


Bursar Incident 13% 17% 61% 29.82 <.0l 
(2 df) 


Theoretical analysis of 
Bursar’s behavior (N = 66) (N = 41) (N = 38) 
15% 34% 50% 14.59 <.01 
(2 df) 
Alcoholism Problem (N = 68) (N = 42) (N = 40) 
Interactional-concep- 
tual treatment of 
alcoholism 43% 48% 70% 7.88 <.02 
(2 df) 


Attitudinal Changes.—Although the categories discussed in the 
previous section describe primarily the intellectual aspects of 
the data; i.e., human relations principles, it is obvious that along 
with various concepts of causality, psychological theories, etc. 
the protocols contain much attitudinal content. The data from 
the Alcoholism Problem, for example, include both sympathetic 
and negative evaluations of alcoholics or Mr. pD.T. in particular. 
The latter is sometimes ridiculed, made out as a social freak, or 
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given up as hopeless in view of his geneology or physiology. 
More often the alcoholic is regarded with sympathy, as a sick 
person in need of careful treatment. These and other forms of 
implicit or explicit attitudes, however, were so closely inter- 
woven with material reflecting intellectual principles that they 
could not be cleanly isolated in either the Alcoholism or the 
Bursar Incident Problem. 

The Management Techniques Problem did permit a categoriza- 
tion of positive or negative attitudes independently of intellectual 
material. An examination of each protocol as a whole revealed 
rather clear cut differences in regard to the evaluation of workers. 
Some subjects focus their analysis upon the needs, feelings, and 
general welfare of the workers. Worker morale and job satis- 
faction are seen as important problems and concern is expressed 
over potential or actual threats to the self-esteem of the worker. 
Other subjects tend to regard workers as chronically dissatisfied, 
characteristically disobedient, and often openly seeking to crush 
the authority of management. Concern is often expressed over 
possible jeopardies to the status and prerogatives of the ‘boss.’ 
Finally, there are subjects who focus almost entirely upon pro- 
ductivity and worker efficiency. There is frequently a categori- 
cal assertion that workers as a group are inefficient, lazy, and 
incapable of assuming responsibilities. The following excerpts 
illustrate these relatively positive and negative attitudes. 


Relatively positive attitude——‘Workers need leadership, true. But 
they want to feel part of a team. Workers whose efforts are not 
rewarded lose interest and become rebellious. This man’s ideas seem 
like part of the feudal system. Workers are human beings and should 


be treated as such.”’ 


Relatively negative attitude —‘‘The security angle of this technique is 
OK and it should be used if the management can keep a respected posi- 
tion. It usually happens though that workers will lose authoritative 
respect for management, may start being careless in their work, and 
expect to fix it up with the boss.” 


Table II shows the percentages of subjects falling into the 
positive and negative attitude categories at each of the three 
measurement points. It can be seen that the frequency of 
positive attitudes increases systematically from the beginning 
to the end of the course while during the same period the expres- 
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sion of negative attitudes decreases. This change is statistically 
significant (Chi-square = 6.99, p <.05). 


TaBLE I].—ATTITUDES TOWARD WORKERS IN THE MANAGEMENT 
TECHNIQUES PROBLEM 
Beginning Middle End 
N =62 N=39 N=39 


Relatively positive evaluation of 


workers 32% 44% 59% 
Relatively negative evaluation of 
workers 68% 56% 41% 


The relationship between insight into the causal relations of 
this problem and attitudes toward the worker is moderately high 
and positive when the data from all three measurement periods 
are combined. Those who show a deeper insight into manage- 
ment-worker relations tend also to express positive attitudes 
toward the workers. The data show further that this relation- 
ship becomes closer at each succeeding measurement point. 
This suggests specifically that as the student adopts an inter- 
actional conception of causation, his attitudes toward certain 
groups change in the direction of greater tolerance. Unfor- 
tunately, because of the small frequencies involved, it is not 
possible to determine the reliability of this trend. 

In summary, there is evidence that the course produced 
significant changes on both the intellectual and attitudinal levels. 
The finding that subjects, in analyzing three problem situations, 
increasingly employed an interactional or ‘field’ conception of 
causality, psychological concepts and theories, confirms the 
major hypothesis of the study that ‘‘the course will deepen the 
students’ understanding of human relations.”” There is also 
some evidence in support of the secondary hypothesis concerning 
the effects of the course upon change in social attitudes. 

The changes described in the previous section may, with fair 
assurance, be attributed exclusively to the course experience. 
It is unlikely that students who did not take the course would 
have changed in the particular ways indicated. The remaining 
research problem, of course, is that of isolating the particular 
components of the course responsible for such changes. In the 
light of what is known about the present course and general 
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theories of change, it may be worth while to pose several ques- 
tions that can provide some direction for future research: 

1) How does student participation affect change? Does 
participation increase motivation and thereby produce bettter 
learning? Does participation reduce various resistances to 
change? Would as much change be produced by a more formal 
teaching method such as the lecture? 

2) How does the use of broad principles and theories about 
human relations—as opposed to empirical ‘facts’—affect change? 

3) What is the effect of verbal and written application of 
human relations principles to concrete problems? Would there 
have been as much change if the discussion had been confined to 
the ‘definition’ of principles, or if the notebooks were not 
required? 

4) In what way and to what extent does a field theoretical 
approach to problems of causation and motivation affect social 
attitudes? 

5) What effect does the general classroom ‘atmosphere’ have 
upon learning? Is a friendly and informal situation especially 
conducive to the learning of human relations content? 

As these empirical questions become clarified through further 
research they should be transformed into higher order conceptual 
questions. At this point it is likely that controlled laboratory 
experimentation will produce results of greater generality. 





— 





THE DIFFERENTIAL APTITUDE TESTS AS 
PREDICTORS OF ACHIEVEMENT TEST 
SCORES* 


JEROME E. DOPPELT and ALEXANDER G. WESMAN 
The Psychological Corporation 


The primary function of aptitude tests is to predict meaningful 
criteria. But what constitutes a meaningful criterion? Very 
often the criterion area can be readily stated in terms such as 
success in school or performance on the job; but the translation 
of success or performance into a measurable and unambiguous 
variable raises a number of difficult problems. Teachers’ grades, 
for example, not only include an evaluation of the student’s 
mastery of subject matter, but also a rating of the student’s 
effort, verbal fluency, work habits, conscientiousness, and other 
aspects of his personality. For the teacher’s purposes all these 
factors, and probably quite a few others, should rightfully be 
included in an evaluation of the student. In test validation, 
however, the complex nature of teachers’ grades tends to obscure 
some of the relationships which exist between the test and various 
aspects of grades. Further, the combination of data for students 
rated by different teachers raises the problem of rating standards 
and the inevitable question: ‘‘Does an A given by Teacher X 
mean the same thing as an A given by Teacher Y?” 

Despite the limitations of teachers’ grades as statistical vari- 
ables, we must recognize that grades are criteria in a very real 
sense—they are actually the principal evaluation in most school 





“situations. It is therefore essential that tests intended as 


predictors be correlated with grades. It is equally essential that 
the interpretation of such coefficients take into account the fact 
that many aspects of the criterion variable cannot be expected 
to correlate highly with the test variable. 

Occasionally we are able to obtain measures of success in school 
which are relatively pure as compared with teachers’ grades. 
Achievement tests administered at the end of a course give an 





* Presented as a paper at the American Psychological Association Con- 
vention, September, 1951. 


210 








Differential Aptitude Tests as Predictors 211 


evaluation of students in terms of content mastery without 
regard to teacher-judged personality factors. For statistical 
studies the use of achievement scores as criteria is obviously 
desirable. Admittedly, subject matter competence is only one 
aspect of the school’s work, but it must also be admitted that it is 
one of the more important aspects. 

This paper reports some of the results of two studies of the Dif- 
ferential Aptitude Tests! in which the criteria were scores on stand- 
ardized achievement test batteries. The first study was made at 
the Ames High School in Ames, Iowa. The Differential Aptitude 
Tests were given in November 1948 to students in Grades X, XI 
and XII. One year later, September 1949, these students were 
given the Jowa Tests of Educational Development.2?, The Differ- 
ential Aptitude Tests include: Verbal Reasoning, Numerical 
Ability, Abstract Reasoning, Space Relations, Mechanical 
Reasoning, Clerical Speed and Accuracy, Spelling and Sentences 
(Grammar). The scores obtained from the Tests of Educational 
Development are: 1) Basic Social Concepts, 2) Natural Sciences, 
3) Correctness and Appropriateness of Expression, 4) Quanti- 
tative Thinking, 5) Interpretation—Social Studies, 6) Inter- 
pretation—Natural Sciences, 7) Interpretation—Literary Materi- 
als, 8) General Vocabulary, 9) Use of Sources of Information 
and 10) A Composite score based on Tests 1-8. The eight 
scores of the DAT were correlated against the ten scores yielded 
by the achievement battery for each grade and sex. The num- 
bers of cases in the six groups ranged from forty-four to sixty-six. 
The complete tables showing each of these correlation coefficients 
may be found in the revised (1952) Manual for the Differential 
Aptitude Tests.' 

The data for the six groups shown in those tables reveal a con- 
siderable number of sizable coefficients. In most instances the 
high correlations are between tests which one would expect to be 
closely related. The DAT Numerical Ability test turns out to be 
a good predictor of TED Quantitative Thinking. For five of the 
six groups the correlation coefficients between these two tests 
are .80 or higher. Correctness and Appropriateness of Expres- 
sion (test 3) is highly correlated with DAT Sentences. The coeffi- 





1 Published by The Psychological Corporation, New York. 
2 Published by Science Research Associates, Inc., Chicago. 
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cients range from .57 for Grade XII Boys to .89 for Grade XI 
Boys. The TED General Vocabulary test is predicted best by 
the Verbal Reasoning test in all groups. The coefficients between 
Verbal Reasoning and General Vocabulary range from .69 to .88. 

There are some coefficients which, at first glance, are quite 
surprising. For Grade XI Girls, for example, the correlation 
between TED Correctness and Appropriateness of Expression 
and DAT Numerical Ability is .71, whereas the coefficient 
between the same achievement test and DAT Sentences is .68. 
Although it may well be pure chance that is responsible for the 
higher correlation of an English test with Numerical Ability than 
with Sentences, (particularly since the number of pupils is fifty- 
three), it is noteworthy that the coefficients between Correct 
Writing and Numerical Ability are high for all three groups of 
girls (.69, .71 and .74). 

Another point of interest is the relationship between the Verbal 
Reasoning test and each of the three tests of specialized reading 
comprehension. These three are the interpretation tests, 
designed to measure the pupil’s ability to do critical thinking 
in the areas of the social studies (test 5), the natural sciences 
(test 6) and literature (test 7). It appears that for all groups, 
the DAT Verbal test is a very good predictor of scores on the 
three comprehension measures; more than two-thirds of the 
eighteen coefficients between Verbal Reasoning and the reading 
comprehension tests are .70 or higher. 

The Composite score of the TED battery is intended to give a 
reasonable approximation of the general level of the pupil’s 
educational development. (It is obtained by finding the sum of 
the standard scores on tests 1-8 and changing this sum into a new 
standard score.) The Verbal Reasoning test, given a year earlier, 
is very highly correlated with the Composite score; the range 
of coefficients is from .71 to .90. 

The data permit answers to two basic questions. First, can the 
Differential Aptitude Tests satisfactorily predict performance in 
each of the achievement areas? In other words, is each achieve- 
ment test well predicted by at least one of the aptitude tests? 
If, for each achievement test, we consider only the coefficient 
which represents the best prediction of it, we find that for Grade 
X the highest coefficients range from .60 to .81; fifteen of the 
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twenty* coefficients are equal to .70 or higher. In the eleventh 
grade, the best predictions range from .64 to .89 and eighteen of 
the twenty exceed .70. For the twelfth grade, the range is from 
A7 to .90 with fourteen of the high coefficients over .70. In 
this study, tests of the aptitude battery are good predictors of 
achievement scores obtained one year later. 

Second, is each of the Differential Aptitude Tests a useful 
predictor? The answer to this question may be found by con- 
sidering the data for each aptitude test individually. We find 
that the best validity coefficients for the respective Differential 
Aptitude Tests are: 





Verbal Reasoning Highest r = .90 with Composite Score, Grade 
XII Girls 


Numerical Ability Highest r = .85 with Quantitative Thinking, 
; Grade XII Boys 
Abstract Reasoning Highest r = .80 with Composite Score, Grade 
XI Boys | 
Space Relations Highest r = .64 with Interpretation in Natural 
Sciences, Grade XI Boys 
Mechanical Reasoning Highest r = .71 with Natural Sciences, Grade 
XII Boys 
Clerical Speed and 
Accuracy Highest r = .49 with Correctness and Ap- 
propriateness of Expression, 
Grade X Girls 
Spelling Highest r = .78 with Correctness and Ap- 
propriateness of Expression, 
Grade XI Boys 
Sentences Highest r = .89 with Correctness and Ap- 
propriateness of Expression, 
Grade XI Boys 


Since these are deliberately selected coefficients, they might 
possibly be freaks. In opposition to this hypothesis, however, 
we find that in every instance the second best prediction accom- 
plished by each test closely approximates the best. We may 
thus conclude that each of the Differential A ptitude Tests is, in this 





* There are ten achievement scores and separate predictions are made for 
each sex. Since we are considering the highest coefficient for each of the ten 
scores, by sex, we have twenty such coefficients in each grade. 
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situation, a genuinely useful predictor of one or more of the areas 
measured by these achievement tests. 

The second study was based on students from Cincinnati, 
Ohio, and Dover, New Jersey, who were tested with the Differ- 
ential Aptitude Tests in 1947 when they were in Grade IX. These 
students were again tested in 1950 with the Essential High School 
Content Battery,* which yields five scores: Mathematics, Science, 
Social Studies, English (Language Arts) and a Total Score. Cor- 
relation coefficients were computed between the DAT and the 
EHSCB for one hundred six boys and for one hundred thirty-six 
girls. The complete tables of coefficients are also shown in the 
revised Differential Aptitude Tests Manual cited above. 

The demands made on the aptitude tests in this study are 
particularly severe, since three years had elapsed between the 
aptitude and achievement testing programs, Nevertheless, 
though the validity coefficients are not quite so high as they were 
in the one-year studies, they are very useful. 

We may again pose the two questions we asked earlier:—Do the 
Differential Aptitude Tests predict each of the achievement areas? 
and Is each of the Differential Aptitude Tests a useful predictor? 
Both of these questions are answered affirmatively. After three 
years, the best predictions of the achievement scores are shown 
by the following validity coefficients: 


Mathematics r= .66 with DAT Numerical, Boys 
Science r= .65 with DAT Verbal Reasoning, Boys 
Social Studies r = .58 with DAT Verbal Reasoning, Girls 
English r= .66 with DAT Sentences, Girls 
Total Score r= .75 with DAT Verbal Reasoning, Boys 


As for the individual Differential Aptitude Tests, the highest 
coefficients for each are: 


Verbal Reasoning Highest r = .75 with Total Score, Boys 
Numerical Ability Highest r = .66 with Mathematics, Boys 
Abstract Reasoning Highest r = .55 with Science, Boys 
Space Relations Highest r = .50 with Mathematics, Girls 
Mechanical Reasoning Highest r = .43 with Science, Boys 
Clerical Speed and Accuracy Highest r = .33 with Mathematics, Boys 
Spelling Highest r = .62 with English, Boys 
Sentences Highest r = .67 with Total Score, Girls 





3 Published by World Book Company, Yonkers-on-Hudson, New York. 
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These data are not as glamorous as those reported for the one- 
year studies. Considering the time interval, however, they 
represent a solid success in prediction. 

It was found that several of the Differential Aptitude Tests are 
about equally effective in predicting some of the achievement test 
scores. The correlations of the Spelling and Sentences tests with 
English for boys in the three-year study (.62 and .64) are prac- 
tically the same as the correlation between Verbal Reasoning and 
English (.65). In the same group, the Verbal and Numerical 
tests are equally predictive of Mathematics achievement (.65 
and .66). For the group of girls, the Total Score, which is a 
median of standard scores on the four tests, is almost equally 
correlated with both Verbal Reasoning (.66) and Sentences (.67). 
An obvious advantage of several good predictors is the oppor- 
tunity afforded the counselor to base his judgments on more than 
one test. In this connection, some of the lower coefficients can 
also be very helpful. The DAT Mechanical Reasoning test, for 
example, correlates with Science Achievement to the extent of 
.43 for boys and .41 for girls. Admittedly, these coefficients 
are not so high for either sex as to make possible very accurate 
predictions of science achievement. They are, however, high 
enough to give a counselor corroborative evidence, particularly 
with regard to very high- or very low-scoring individuals. 

If multiple regression techniques are used, the relationships 
between the aptitude and achievement scores are, of course, 
increased. For boys in the three-year study, for example, 
a correlation coefficient of .66 was found between the DAT 
Numerical test and the Mathematics test of the Essential High- 
school Content Battery. If the DAT Verbal, Numerical and 
Clerical tests are used as predictors, the multiple correlation 
coefficient rises to .78. Obviously, cross-validation on secondary 
samples would be needed before such relationships could be 
entirely accepted. However, the zero-order coefficients are 
sufficiently high in both studies to give ample promise of satis- 
factory prediction of the scholastic measures from the aptitude 





scores. 

Most of the best predictions can be made from one or two tests. 
In the one-year study, Verbal Reasoning is the best single pre- 
dictor and Verbal Reasoning and Sentences together account for 
fifty of the sixty highest coefficients with achievement. In the 
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three-year study, the same two tests account for eight of the ten 
highest coefficients with achievement. Inspection of the achieve- 
ment batteries indicates that this is probably due to the large 
verbal or language factor which runs through the achievement 
measures. Evidence for this may be found in a study of the 
intercorrelations of test scores. The median intercorrelation 
coefficients among the Jowa Tests of Educational Development and 
the Essential High-school Content Battery are shown for the 
various groups in Table I. For comparative purposes, the 
corresponding medians for the Differential Aptitude Tests are 
shown. 


TABLE I.—MEDIAN COEFFICIENTS OF INTERCORRELATION AMONG 
THE APTITUDE AND ACHIEVEMENT BATTERIES 


One-year Study 





DAT TED* 





Boys Girls Boys Girls 





Ore .40 48 .65 .76 
Grade XI........... .50 .40 .74 71 
Grade XII.......... .40 .53 .60 .79 

















Three-year Study 





DAT EHSCB* 





Boys Girls Boys Girls 





.30 .40 .62 .60 














* The Composite Score in TED and the Total Score in EHSCB were not 
used in computing the medians. 


The rather high interrelationships among the tests of the 
achievement batteries indicate considerable overlapping of 
measurement among those tests. This is not to be taken as 
condemnation of the achievement tests. Language or verbal 
ability plays a large part in all the areas measured and the tests 
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would necessarily overlap to a considerable extent. It is there- 
fore quite reasonable to find that it takes only one or two of the 
aptitude tests to predict effectively these particular achievement 
scores. 

The prediction of scores on the achievement tests used in this 
study is but a small part of the task for which a battery of apti- 
tude tests is prepared. Such a battery must also be useful for 
the prediction of success in educational areas not covered by these 
particular achievement tests (e.g., vocational course work), in 
careers and in jobs. When the intercorrelations among the tests 
of an aptitude battery tend to be relatively low there is the 
possibility (not the guarantee) of effective prediction in areas 
outside the traditional academic situation. 

The success of aptitude tests as predictors for different areas 
of endeavor will of course vary from one situation to another. 
The need for local validation studies has been well established. 
In the present studies, the Differential Aptitude Tests were found 
to be effective predictors of scores on two reliable achievement 
batteries; and the intercorrelations among the aptitude tests 
suggested the possibility of predicting success in areas not 
measured by the achievement tests. Further investigations of 
these aptitude tests in various job situations and industrial 
settings would be well worth doing. 








HOW INVALID ARE MARKS ASSIGNED 
BY TEACHERS? 


ROBERT SCRIVEN CARTER 


Denison University 


With the rapid development of objective testing procedures 
in the United States, it was to be expected that there would be 
numerous investigations concerning teachers’ marks. Mathe- 
matics, traditionally a subject which lends itself to objective 
measurement, has come in for its share of these investigations. 
There is a scarcity of investigations of the validity of teachers’ 
marks in beginning algebra. Most studies devoted to the ques- 
tion of teachers’ marks have been carried out with respect to 
elementary school arithmetic or with plane geometry. The 
latter, usually an elective subject, is not necessarily subject to 
the same factors in teachers’ assignment of marks. Of the 
research in the elementary school field, a great portion is devoted 
to a discussion of sundry philosophical aspects of the question or 
an evaluation of the theoretical implications of marks in general. 


THE PROBLEM 


The investigation was designed to determine whether or not 
teachers tend to favor one sex and whether the sex favored tends 
to be determined by the sex of the teacher. The study sought an 
answer to the problem: With intelligence held constant, what is 
the relationship between the sex of the student and the sex of 
the teacher in the assignment of marks in beginning algebra? 


REVIEW OF THE LITERATURE 


Garner‘ attempted to compare the marks assigned by men 
and women teachers. His data were obtained by investigating 
5,152 marks assigned to boys and 5,132 marks assigned to girls. 
He made no atterapt to differentiate school subjects. He con- 
cluded that both men and women give high marks to girls rather 
than to boys, that women sort students so that the boys get low 
marks. His study concluded that there is need for refining 
marks to make them more meaningful. 

Swenson? investigated the membership of the National Honor 


Society at Lindsborg, Kansas, High School for the years 1932 
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to 1941. In his investigation he found that even though boys 
out-numbered girls in class attendance for the ten-year period, 
girls outnumbered boys in the Honor Society by 2.75 to 1. He 
did not find substantial differences in the intelligence of boys and 
girls, but decided that membership was gained by inequalities 
in teachers’ marks. He concluded that teachers were prejudiced 
against boys. 

Three writers, Day,'! Douglass,? and Shinnerer,’ in three 
separate studies, for the years 1937, 1938, and 1944, concluded 
that boys had more failures at the secondary-school level, girls 
had a consistent and generally substantial advantage over boys in 
obtaining honor ranks, but, in the light of the investigations, it 
seems probable that marks are determined by factors other than 
achievement, especially marks assigned by women teachers, and 
that these influences result in slight overrating of girls generally 
and the particular underrating of boys by women teachers. 

Newton,‘ in a study in 1942, reported that women gave higher 
grades than did men teachers in Central High School, Indian- 
apolis, Indiana. He inspected the grades on two hundred 
forty-six permanent record cards which had been assigned by 
twelve women teachers and twenty-six men teachers. The 
total number of grades inspected was 4,255. He made no 
effort to account for the difference, nor did he state whether the 
differences were significant. 

Edmiston* added additional evidence of sex differences in 
marks assigned by teachers. In the situation which he studied, 
the average grade for girls was 84.4 and for boys, 80.0. He 
further pointed out that women teachers gave the girls grades that 
averaged 5.4 points above those given to boys, while men 
teachers were less partial to the girls, giving them an average of 
only 3.4 points above those given to the boys. 

Lobaugh,® investigating the relationship of achievement and 
marks assigned by teachers, found that girls had a grade point 
average of 2.19 while the boys had a grade point average of only 
1.97. When he compared the scores made on the Myers-Ruch 
High School Progress Test, the boys’ median score was 46 while 
the median score for girls was 36. This ten-point differential was 
characteristic of all tests administered during the period from 
1940 to 1945. Further, in 1940 the valedictorian, a girl, could 
do no better than rank number 36, while it was necessary to go 








220 The Journal of Educational Psychology 


down to number 105 to find the salutatorian. In 1941 the 
valedictorian ranked number 19 while the salutatorian ranked 
number 41. On the 1940 test, the boy who ranked number 1 on 
the achievement test failed to graduate and had to return to 
school for the fifth year in order to graduate. The results in 
1940 showed that the top fourteen scores were made by boys. 
In 1941 and 1942 the results indicated that only three girls could 
be found among the top fifteen scores. Lobaugh accounted for 
the differences between achievement and marks on the basis of 
evidence that girls were more meticulous, more punctual, and 
neater about their work. He also recognized greater maturity 
among the girls and a tendency for the boys to compensate for 
their immaturity. 


MATERIALS AND SUBJECTS 


Results of an investigation of this type are of most value 
when they can be used for evaluation and interpretation over a 
wide area, or by a large number of individuals. With this in 
mind the investigation was undertaken in a city in western 
Pennsylvania. Two hundred sixty pupils took part in the 
testing program from which the basic data for this study were 
obtained. This investigation is based on two hundred thirty- 
five pupils taking high-school algebra for the first time. In all, 
nine classes were used, four classes being taught by women and 
five classes being taught by men. Of the students, one hundred 
thirty-five were boys and one hundred were girls. Since students 
are assigned to classes alphabetically, no known selective factors 
operate which would give a biased sample. 

In the school in which the investigation was made, there are 
six teachers, three men and three women, teaching beginning 
algebra. The six teachers all hold valid Permanent High School 
Certificates issued by the Pennsylvania Department of Public 
Instruction. None of the teachers have had less than fifteen 
years of experience. The training of the three men and the three 
women used in this study was almost identical. Whatever effect 
factors of age, training, and experience may have on assigned 
marks was minimized. 

During the last week of the first semester, the investigator 
administered the Otis Quick Scoring Mental Ability Test, Beta 
Test, Form A, and the Colvin-Schrammel Algebra Test, Test I, 
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Form A, to all students enrolled in the course in beginning 
algebra in the public school of a western Pennsylvania city. 
One week following the end of the first semester the examiner 
inspected the permanent record cards for the subjects used in the 
investigation. These cards were inspected in the office of the 
principal. From the cards, the examiner secured the necessary 
information concerning the marks assigned by individual teachers 
to each student as an indication of his level of achievement for 
the semester. 


RESULTS OF THE TESTING PROGRAM 


For the purpose of this investigation, the sample is divided 
into four groups: 1) Boys taught by men teachers. 2) Girls 
taught by men teachers. 3) Boys taught by women teachers. 
4) Girls taught by women teachers. 

In the presentation of the data which follow, the results of the 
testing program are presented so as to reflect these categories. 


A) INTELLIGENCE TEST RESULTS 


In Table I are shown the critical ratios of the differences of 
the various groups with respect to mental ability as measured 
by the Otis Test. The differences for the various groups (range 
1.62 to —.83), when treated statistically, give critical ratios 
that range from .91 to .13. The largest difference shown, that 
between boys and girls taught by men (1.62), gives a critical 
ratio of only .91. Differences as large as 1.62 might be expected 
by chance one out of five times. It is important to realize, 
therefore, that with respect to intelligence, as measured by the 
Otis Quick-Scoring Mental Ability Test, there are no statis- 
tically significant differences between the groups in the present 
investigation. 


B) ALGEBRA ACHIEVEMENT TEST SCORES 


The critical ratios of the differences between the means of the 
various groups based on the results of the Colvin-Schrammel 
Algebra Test are found in Table II. Differences range from 1.69 
(the difference in mean scores for boys and girls taught by men) 
to .17 (the difference between groups taught by men and by 
women). It is to be noted that the average boy makes a better 
score on this test than does the average girl. The mean score 
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for boys taught by men is higher than the mean score for girls 
taught by men. On the other hand the average score of girls 
taught by women exceeds the average score for the boys. The 
student whose sex is the same as that of the teacher makes higher 
mean scores than do the students whose sex is opposite to that of 
the teacher. 


TaBLE J.—DIFFERENCES AND CRITICAL RATIOS BETWEEN 
Scores Mapre sy Boys ANp GIRLS ON THE OTIS QUICK 
Scorinc Menta Asiuity Tests, Beta TEstT 


Diff. 
N Sex Mean SD SEu (M:— M:) SEp CR 
Taught by Men 
75 Boys 107.28 11.59 1.36 
58 Girls 108.90 8.84 1.16 


1.62 1.79 91 
Taught by Women 


60 Boys 107.60 8.15 1.05 
42 Girls 107.38 9.14 1.41 
.22 1.76 13 
Boys (Taught by) 
75 Men 107.28 11.59 1.36 
60 Women 107.60 8.15 1.05 
— .32 1.72 —.19 
Girls (Taught by) 
58 Men 108.90 8.84 1.16 
42 Women 107.38 9.14 1.41 
1.52 1.83 .83 
Totals (Taught by) 
133 Men 107.97 10.64 .92 
102 Women 107.51 8.57 .85 
.46 1.25 .37 
Totals 
135 All Boys 107.42 10.34 .89 
100 All Girls 108.25 8.99 .90 


— .83 1.26 — .66 


With respect to average achievement in algebra, as measured 
by the Colvin-Schrammel Test, the differences among the various 
groups in this investigation are not significant. The largest 
critical ratio, the critical ratio between the mean score for boys 
and girls taught by men, 1.13, indicates that differences as large 
as 1.69 could be expected by chance thirteen times out of one 
hundred. It must be concluded, then, that the small differences 
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among the various groups could happen by chance, and, as far 
as ability in algebra is concerned, boys and girls, whether the 


TABLE IIT.—MEAN DIFFERENCES AND CRITICAL RATIOS BETWEEN 
Scores MApDE By Boys AND GIRLS ON THE COLVIN-SCHRAMMEL 


75 
58 


58 
42 


133 
102 


135 
100 


ALGEBRA ACHIEVEMENT TEstT, Form A 


Sex 


Taught by Men 
Boys 
Girls 


Taught by Women 
Boys 
Girls 


Boys (Taught by) 
Men 
Women 


Girls (Taught by) 
Men 
Women 


Totals (Taught by) 
Men 
Women 


Totals 
All Boys 
All Girls 


Mean 


30.84 
29.15 


29.51 
30.52 


30.84 
29.51 


29.15 
30.52 


30.10 
29.93 


30.25 
29.73 


SD 


~I 


~I 


a © “I J 


“I 00 


.32 
74 


.42 
19 


.32 
.42 


74 
.19 


.70 


18 
. 54 


SEx 


1.08 
1.02 


1.09 
1.11 


1.08 


75 
.67 


.70 
75 


Diff. 


(Mi — Mz) 


1.69 


—1.01 


1.33 


—1.37 


17 


.52 


SEp 


1.49 


1.56 


1.54 


1.51 


1.00 


1.02 


CR 


1.13 


— .65 


. 86 


— .91 


17 


51 


teacher is a man or a woman, show equal algebra achievement 


within the limits of the present data. 


C) MARKS ASSIGNED BY TEACHERS 


The critical ratios of the differences between the mean grades 
assigned to boys and girls by teachers of beginning algebra are 
found in Table III. 


Although it has been shown that no sig- 


nificant differences exist among the various groups in either 

intelligence or algebra achievement, significant differences are 

found in the marks assigned by teachers of beginning algebra. 
The average mark assigned by men is 6.44 points lower than 
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the average mark assigned by women. This difference is nearly 
six times the standard error of the difference. This cannot be 
attributed to chance factors. In all instances the girls receive 
higher average marks than do the boys. In the present investi- 
gation, the difference in marks assigned by men and women teach- 
ers indicates that men assign lower marks to boys and to girls. 


TABLE II].—DIFFERENCES AND CRITICAL RATIOS BETWEEN 
Marks ASSIGNED TO Boys AND TO GIRLS BY TEACHERS OF 
BEGINNING ALGEBRA 


Diff. 
N Sex Mean SD SEm (M:; — M:) SEp CR 
Taught by Men 
75 Boys 76.61 8.19 .95 
58 Girls 79.50 6.60 .87 
—2.89 1.29 —2.24 
Taught by Women 
60 Boys 82.63 9.08 1.17 
42 Girls 86.71 8.67 1.34 
—4.08 1.78 —2.29 
Boys (Taught by) 
75 Men 76.61 8.19 .95 
60 Women 82.63 9.08 1.16 
—6.02 1.50 —4.01 
Girls (Taught by) 
58 Men 79.50 6.60 .87 
42 Women 86.71 8.67 1.34 
—7.21 1.60 —4.51 
Totals (Taught by) 
133 Men 77.87 7.67 .67 
102 Women 84.31 9.15 91 
—6.44 1.13 —5.70 
Totals 
135 All Boys 79.29 9.10 .78 
100 All Girls - 82.53 8.34 .83 


—3.24 1.14 —2.84 


More specifically, the data show that boys are given lower aver- 
age marks than are girls, regardless of the sex of the teacher 
assigning the marks; but, marks assigned by men are lower than 
those assigned by women. Consequently, boys get the lowest 
average marks when those marks are assigned by men. Girls, 
on the other hand, get the highest marks when those marks are 


assigned by women teachers. 
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D) RELATIONSHIP BETWEEN TEACHERS’ MARKS AND ACHIEVEMENT 
SCORES 


In Table IV are presented the coefficients of correlation, both 
zero order or product moment r’s and partial r’s of the first order. 
The product moment 7’s reflect the relationship between teachers’ 
marks and algebra achievement. The partial r’s reflect the 
relationship between teachers’ marks and algebra achievement 
with the effect of intelligence held constant. 


TABLE 1 V.—CoOEFFICIENTS OF CORRELATION BETWEEN TEACHERS’ 
MARKS AND ALGEBRA ACHIEVEMENT AND BETWEEN TEACHERS’ 
MARKS AND ALGEBRA ACHIEVEMENT WITH INTELLIGENCE 
HELD CONSTANT 


0. Teachers’ Marks 
1. Algebra Achievement Scores 
2. Intelligence Test Scores 


Boys Girls Total 

Taught by Men 

Number 75 58 133 

To1.2 .70 + .06 47+ .10 .57 + .06 

To1. 78 + .05 .57 + .09 .68 + .05 
Taught by Women 

Number 60 42 102 

01.2 .28 + .12 18 + .15 .3l + .09 

Toi. oe .i} 30 + .14 .43 + .08 
Totals 

Number 135 100 235 

To1.2 47+ .07 .386 + .09 .40 + .05 

To. 59 + .06 .45 + .08 562 + .05 


When intelligence is held constant and teachers’ marks are 


compared with algebra achievement, the correlation for the first 
order r decreases in magnitude in all groups. Large changes are 
found in the totals (N equal to 235) when the product moment r 
is .52 and the partial r is .40, and among all boys (N equal to 135) 
when the value of the r changes from .59 to .47. The largest 
change, however, is observed in comparing the coefficients for 
girls taught by women (N equal to 42). When teachers’ marks 
are compared with achievement scores, the zero order r is .35. 
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When intelligence is held constant, and these two variables are 
compared, the coefficient is reduced to .18. 

When intelligence is held constant and teachers’ marks are 
compared with algebra achievement scores, the reduction in the 
magnitude of the r’s indicates that teachers’ marks actually 
reflect not only achievement but also that the teacher assigns 
at least part of the mark on the basis of intelligence. This 
condition is not prevalent in the same amount under all groups. 
The grades assigned to boys by men teachers are not as greatly 
affected as are the grades assigned to girls by women teachers. 
As is to be expected, however, intelligence is a factor in the assign- 
ment of marks by both men and women teachers to both boys and 


girls. 


SUMMARY 


With respect to intelligence, no significant differences existed 
among any of the groups. In the results of the algebra achieve- 
ment scores, small and, on the whole, insignificant differences 
favored the group whose sex was the same as the sex of the teacher. 
The differences in achievement were not significant at the one per 
cent level of confidence, indicating that the small differences 
which were present could have been accounted for by chance. 

When the teachers’ marks in beginning algebra were investi- 
gated, significant differences were observed. Girls made signifi- 
cantly higher marks than did the boys. Women teachers tended 
to give higher marks than did the men teachers. Specifically, 
when marks were assigned, boys were given lower marks than 
were the girls, regardless of whether the teacher was a man or a 
woman; but, marks assigned by men teachers were lower than 
marks assigned by women teachers. 


CONCLUSIONS 


1) It was evident from the data, although no significant differ- 
ences could be found in intelligence or in algebra achievement, 
that significant differences existed in the marks assigned by 
teachers, differences clearly not attributable to chance. The 
differences, generally, gave the advantage to the girls. It was 
made clear that the girls were no smarter, did not know any more 
algebra, but they did receive higher marks. 
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2) There were definite indications that intelligence was a factor 
in the assignment of marks. The correlation coefficient between 
teachers’ marks and algebra achievement gave some indication 
that, theoretically, at least, they represent measurement of the 
same variable. 

3) When intelligence was partialled out, and thus held con- 
stant, the relationship between teachers’ marks and achievement 
declined. This indicated that the teachers’ marks not only 
reflected achievement but also intelligence. Since the relation- 
ship was far from perfect, some other factors entered into the 
assignment of marks by teachers of beginning algebra. 

4) It must be concluded that teachers’ marks represent more 
than chance estimates of the pupils’ achievement. The findings 
in the present investigation indicate that teachers’ marks repre- 
sent achievement, but, and this is important, they give evidence 
of the effects of intelligence upon the teacher. 

5) It must also be concluded in the light of the data in the 
present investigation that the sex of the teacher was not so 
important in the investigation of marks as was the sex of the 
student. Regardless of whether the teacher was a man or a 
woman, boys were penalized in the assignment of marks. The 
penalty was not so great, at least so far as these data were con- 
cerned, if the teacher was a man. There was higher correlation 
between achievement and teachers’ marks when the teacher was a 
man. 

6) The data indicated a definite necessity for the refining of 
marks, if these marks are to reflect true achievement. The data 
used in this investigation proved that there is a slight overrating 
of girls generally and an underrating of boys, especially by women 
teachers. 


RECOMMENDATIONS FOR FUTURE STUDIES 


The evidence in the present investigation indicated that the 
mark assigned by teachers reflected more than algebra achieve- 
ment. From the evidence at hand it was impossible to account 
for all of the factors that affected the grading or marking situa- 
tion. In view of the data upon which the conclusions of the 
present investigations were drawn, further investigation should be 
directed toward finding answers to the following questions: 
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1) What are the effects of interest, socio-economic status and 
personality of the student on the assignment of marks by teachers 
of beginning algebra? 

2) Are the factors mentioned above important in the grading 
situation when investigated in light of sex differences of student 
and teacher? 

3) Of what significance are non-intellectual factors when teach- 


ers assign marks? 
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THE EFFECTIVENESS OF WORK IN REMEDIAL 
READING AT THE COLLEGE LEVEL* 


WALTER B. BARBE 
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Kent State University 


INTRODUCTION 


Even though Frances Triggs’ reported in 1943 that twenty per 
cent of entering college students read less efficiently than did the 
average eighth-grade pupil, colleges have still been reluctant to 
offer any form of a remedial or corrective reading program. In 
the fall of 1950, Baylor students who felt that they needed 
improvement in reading were tested. Differences in reading rate 
of as much as 14,000 words an hour were found to exist. This 
great variation in the reading ability of college students was even 
more clearly demonstrated in a study at the University of Chicago 
by Ivan A. Brooker! where all of the entering students were tested 
and differences as great as 18,000 words an hour were found. 

In Remedial Reading at the College and Adult Levels: An Experi- 
mental Study, Guy T. Buswell? stated that: “ . . . by the sixth 
grade of the elementary school, children can master the basic 
factors of the reading process so well that, for materia! within 
their range of experience and within their vocabulary, they can 
read with as much speed and with as full understanding as adults 
can read the same kind of material.” 

This would seem to indicate that the problem of poor reading 
is not one which rightfully belongs on the college level. Since 
poor readers are reaching the college level, however, the problem 
must either be faced realistically or avoided at the expense of the 
students’ failing to obtain the most from college. 

Norman Lewis’ reported on a number of instances where 
clinics have had success in improving the reading ability of college 
students. The University of Florida, Dartmouth, City College 
of New York, and the Air University at Maxwell Field, Alabama, 
are cited as examples. 

In a recent article Dorothy McGinnis‘ reported on corrective 
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reading work at the college level. She found in her study that 
not only did students make significant gains in reading after 
remedial work, but they also made higher grade-point averages. 

A study by Paul Witty’ in 1940 reported that remedial reading 
work at the college level was being offered only on a limited scale. 
This condition probably has improved somewhat today, but the 
big majority of students are still not being reached. Frances 
Triggs® stated that she believed one of the reasons colleges were 
reluctant to offer remedial reading work was, ‘“‘the lack of any 
clear-cut demonstrations that remedial programs really do 
improve student’s reading abilities.” 


PURPOSE 


It is the purpose of this study to determine: (1) the gains which 
can be made in remedial reading work at the college level, (2) the 
relative permanency of any such gains, and (3) the significance 
of any change which might occur in college grades following 
remedial reading work. 


METHODS 


Subjects.—Fifty subjects were used in the entire experiment. 
They ranged in classification from college freshmen to senior law 
students. The subjects were selected from those students who 
expressed a desire to improve their reading ability. The experi- 
mental group was made up of the first twenty-five students who 
reported for a reading test and were able to fit a section meeting 
one hour each day, five days a week, into their schedule. The 
first twenty-five students who reported for the reading test but 
were unable to schedule a reading improvement section made up 
the control group. The purpose of the second, or control, group 
was to demonstrate whether the gains made by the experimental 
group were due to the remedial work or merely to the time spent 
in college. The experimental group actually acted as its own 
control, the results of the first test being compared with the results 
of the second and final tests. 

The twenty-five subjects in the experimental group were di- 
vided into five subgroups, each group meeting at a different hour. 
It was felt that sufficient individual attention could be given to 
each subject if the size of the group was limited to five. 

Two of the experimental subjects were working on reading 
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improvement following academic expulsion due to grade-point 
deficiencies caused by ‘poor reading ability.’ It is questionable 
whether one of these was actually a reading problem. In terms 
of his intellectual ability, his reading was satisfactory. 

Materials—A wide variety of materials was used with the 
experimental group. The subject was not held in any way to a 
prescribed reading list. He was encouraged to read materials in 
which he had an interest. When the subject was unable to 
locate materials which interested him, as was often the case, a 
number of books were suggested. Such books as Short Stories by 
Saki (Munro) and Our Miss Boo by Runbeck were very popular 
with the majority of the group. Religious biographies were also 
very popular. This was due no doubt to the religious affiliation 
of the subjects. Reading of this type was done during super- 
vised reading periods on SRA Reading Accelerators. 

Since the primary emphasis of the reading improvement work 
was to increase reading rate without loss in comprehension, 
testing materials were difficult to locate. The reading sections 
from the Iowa Silent Reading Tests were finally selected. Only 
Test I, Reading Rate and Comprehension, was administered. 
On Part A of Test I, the subject was allowed to read for one min- 
ute. At the end of that time he was questioned about the 
material. The subject was scored on only those questions con- 
cerning material which he had read. In this way it was possible 
to determine how well the slow reader was comprehending, with- 
out penalizing him on his comprehension score by asking ques- 
tions over material which he had not had time to read. The rate 
was determined by adding the number of words read. This was 
repeated on Part B of Test I and an average of the two rates was 
taken to be the reading rate at that testing. Following this 
procedure, Forms Am, Bm, and Dm of the Iowa Silent Reading 
Tests were administered at the initial, second, and final testing. 

The Michigan Vocabulary Test was administered to determine 
the extent of vocabulary difficulty. Thirty Days to a More Power- 
ful Vocabulary by Norman Lewis and Wilfred Funk was used 
extensively in an effort to remedy this difficulty. The Personality 
Inventory by Robert G. Bernreuter was administered to deter- 
mine personal adjustment. It is likely that the counseling aspect 
of the program had an influence on the reading improvement of 
the subjects. No effort was made, however, to determine the 
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effect of the counseling as separated from the actual reading 
improvement work. 

The Otis Self-Administering Tests of Mental Ability were given 
to the entire experimental group. Where there was any question 
concerning the score on the Otis, the Wechsler-Bellevue test was 
given. 

It was the purpose of these tests to assist in diagnosing the read- 
ing difficulty. It was believed that best results could be achieved 
by having as good an understanding as possible of each individual 
and his problem. 

Two workbooks were used for week-to-week checks of reading 
progress. A Manual of Reading Exercises for Freshmen by 
Luella C. Pressey was used regularly. Study Type of Reading 
Exercises by Ruth Strang was also used, but much less extensively. 

Apparatus.—Six SRA Reading Accelerators were used to con- 
trol and increase the reading rate of the subjects in the experi- 
mental group. Although the subjects were able to determine for 
themselves the rate at which they were reading on the machines, 
no tests were ever administered on the machines. It was found 
that external noises were very distracting to the subjects while 
they were reading on the machines. In order to eliminate these 
distractions, individual booths were constructed foreach machine. 

Individual procedure.-—While the subjects in the experimental 
group met in subgroups of five, every effort was made to provide 
an individual approach to each subject’s reading problem. Tests 
from Pressey’s A Manual of Reading Exercises for Freshmen were 
administered weekly. This provided regular checks on reading 
rate and comprehension. Frequent checks were necessary 
because of the artificial rates which the subjects developed on the 
accelerators. On several occasions it was found that a subject 
was reading nearly twice as fast on the accelerator as he was able 
to read on the weekly test. This indicated that the subject was 
sacrificing comprehension for speed on the accelerator. It was 
repeatedly emphasized that comprehension was the most impor- 
tant factor. This might account for the consistantly satisfactory 
comprehension scores and the lack of any phenomenal gains in 
rate at the expense of comprehension. 

The results of the weekly reading test did much toward deciding 
what was to be done by the individual subject for the next week. 
When a subject increased in rate and maintained a high compre- 
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hension score, he continued with his program as it was. If he 
were to fall down in comprehension, an effort was made to deter- 
mine the reason for this. Each schedule was prepared individu- 
ally and was flexible enough to put emphasis upon the difficulty 
experienced by the subject at that time. 

Much time was spent in individual counseling. This involved 
assisting with schedules, study habits, personal problems, etc. 

Each subject had an individual folder where his test results and 
progress chart were kept. Once each week, following the weekly 
test, the subject prepared the progress chart and discussed the 
past week’s work with the examiner. At this time the schedule 
for the coming week’s work was prepared. 

Testing procedure.—Initially each subject in both the experi- 
mental and control group was tested for reading rate and compre- 
hension on Form Am of the Iowa Silent Reading Tests. The con- 
trol group was not contacted again until after the experimental 
group had undergone twelve weeks of reading improvement work. 
Both groups were then retested; this time on Form Bm of the 
Iowa Tests. ‘To determine if the results of the reading improve- 
ment work were still significant six months later, both groups 
were again retested, this time on Form Dm of the Iowa Tests. 

In an effort to determine whether reading improvement had any 
effect upon grades received in college courses, the grade-point 
averages of the subjects in the experimental group and in the 
control group for the Winter Quarter, 1950, were compared with 
their grade-point averages for the Winter Quarter, 1951. The 
reading improvement work was carried on during the Fall 
Quarter, 1950. Only eighteen subjects in the experimental group 
and sixteen in the control group had been in school both quarters 
and could be used in this phase of the study. 


RESULTS 


Data found in this study have been tabulated in Tables 1, 2, 3, 
and 4. The mean rate of the experimental group at the initial 
testing was 213 words per minute. ‘Twelve weeks later, on the 
second testing, the mean rate was 350 words per minute, an in- 
crease of sixty-four per cent. This was a very significant 
increase. 

The control group, which originally had a mean reading rate 
of 246 words per minute, increased to 250 words per minute on the 
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second test, an increase of about two per cent. This increase 
was not significant. 

At the time of the final testing, six months had elapsed since the 
end of the reading instruction. It was intended to demonstrate 
by this testing that the reading improvement had been permanent 


TABLE 1.—DaTA FOR READING RATE OF EXPERIMENTAL GROUP 
AT Eacu TESTING 


Initial Test Second Test Final Test 


(Sept.) (Dec.) (June) 
Number of Subjects 25 25 25 
Mean Rate (Words per 
Minute) 213 350 318 
SE of Mean 13 .50 18.32 16.68 


Standard error of the difference of the means on the initial and second test 
is 22.76, t = 6.02; very significant. 

Standard error of the difference of the means on the initial and final test is 
21.46, t = 4.89; very significant. 


TABLE 2.—DaTA FOR READING RATE OF CONTROL GROUP AT 
Eacu TESTING 


Initial Test Second Test Final Test 


(Sept.) (Dec.) (June) 
Number of Subjects 25 25 25 
Mean Rate (Words per 
Minute) 246 250 253 
SE of Mean 14.18 13.93 13 .67 


Standard error of the difference of the means on the initial and second test 
is 19.88, t = .20; not significant. 

Standard error of the difference of the means on the initial and final test is 
19.70, t = .36; not significant. 


enough to still be significant. The mean of the experimental 
group, in the final testing, was 318 words per minute; an increase 
of forty-nine per cent over the initial mean of 213 words per 
minute. While there had been some decrease in reading rate 
during the six months between the second and final testing, the 
difference between the means of the initial and final tests was still 
very significant. The control group had increased from 246 
words per minute to 253 words per minute, about three per cent, 
in the nine months between the initial and final tests. This is not 
a statistically significant change. 
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In Table 3 the grade-point average of the subjects in the experi- 
mental group in the Winter Quarter, 1950, before reading instruc- 
tion, is compared with their grade-point average in the Winter 
Quarter, 1951, after reading instruction. Only eighteen of the 
group were in school both quarters and could be used in this 
phase of the study. Where the mean grade-point average was 
1.495 in the Winter, 1950, it had increased to 1.95 in the Winter, 


TABLE 3.—GRADE-PoOINT AVERAGES OF EXPERIMENTAL GROUP 
BEFORE AND AFTER READING INSTRUCTION 


Winter Quarter Winter Quarter 


1950 1951 
Number of Subjects 18 18 
Mean 1.495 1.95 
SE of Mean my 17 


The standard error of the difference of the two means is .24, t = 1.9; 
significant at the .05 level, but not at the .01 level. 


TABLE 4.—GRADE-POINT AVERAGES OF CONTROL GROUP BEFORE 
AND AFTER READING INSTRUCTION 


Winter Quarter Winter Quarter 


1950 1951 
Number of Subjects 16 16 
Mean 1.46 1.48 
SE of Mean 15 17 


The standard error of the difference of the two means is .23, t = .09; not 
significant. 


1951. This increase is significant at the .05 level, but not at 
the .01 level. 

In Table 4 the same comparison is made for the control group. 
The change in grade-point average of the control group was not 
found to be significant. 


SUMMARY AND CONCLUSIONS 


Fifty college students were chosen to serve as subjects in an 
experiment to determine the improvement which could be made 
in reading, the permanency of any gains which could be made, 
and the possible effect which such improvement would have on 
the subjects’ grades. 
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Twenty-five of the subjects, who made up the experimental 
group, worked for twelve weeks to improve their reading ability. 
The remaining twenty-five subjects, who made up the control 
group, were tested for reading ability but were given no help in 
reading improvement. At the end of twelve weeks, both groups 
were retested. 

The mean reading rate of the experimental group had increased 
sixty-four per cent. The mean reading rate of the control group 
had increased only two per cent. This was a statistically signif- 
icant increase in the experimental group but not in the control 
group. 

On the final test, six months after the remedial work had been 
discontinued, the experimental subjects were still reading forty- 
nine per cent faster than they had been on the initial test. Mem- 
bers of the control group were reading three per cent faster, an 
increase which was not statistically significant. 

A comparison of the grade-point averages of the subjects in the 
experimental group before and after reading instruction indicated 
that the improvement was significant at the .05 level, but not at 
the .01 level. The change for the subjects in the control group 
was not significant. 

From the data, it may be seen that for the subjects used in this 
experiment: (1) significant gains were made in remedial reading 
work at the college level which should emphasize the value of such 
a program, (2) the gains which were made were still significant 
six months after the end of the remedial work, indicating relative 
permanency, and, (3) the grade-point average of the experimental 
group showed an improvement significant at the .05 level, indi- 
cating some positive value of remedial reading work in improving 
students’ grades. 
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THE TEST CONSTRUCTOR’S RESPONSIBILITY 


DOROTHEA W. F. EWERS 
Chicago Board of Education 


There are four kinds of tests—tests which look good; tests 
which have assumed validity; tests with content validity; and 
tests with statistical validity. 

Tests which look good are the guesses people make as to what 
might work in a particular situation, but which have not been 
tried out. Tests which have assumed validity are those con- 
structed on the basis of factor analysis studies and the like. The 
material in them is based on more than just a new idea. Tests 
which are content valid are achievement tests. They are tests 
which are face valid. Face validity implies that the judgment of 
an expert in the field covered by the test is better than the judg- 
ment of recognized experts concerning the abilities of individuals. 
Tests which are statistically valid are those which have been 
found to discriminate between good and poor groups on the basis 
of either an internal criterion or an external one. 

Tests which look good and tests which have assumed validity 
should be tried out and proved to have statistical validity before 
they are released for service use. Aptitude tests come in these 
categories. Tests which are content valid may be released with- 
out determination of their statistical validity, but probably 
should not be released unless item analyzed if they are built by 
naivé test constructors. Statistical validity, while superior in 
the case of aptitude tests, is not superior in the case of achieve- 
ment tests. 

For purposes of convenience in discussion, it may be assumed 
that any very large test construction outfit constructs five varie- 
ties of tests: 

Experimental tests—Tests which are not known to have content 
validity but which look as though they might be useful as pre- 
dictors in certain general areas. These tests must be statistically 
validated before they can be released. 

Provisional and Preliminary tests Tests which are content 
valid and thus safe to release for use without statistical validation. 
These are built to cover particular manuals or particular subject 
matter, the coverage of which in the test can be attested to by 
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subject-matter experts. These achievement tests are the only 
tests which should be attempted by inexpert test constructors who 
do not have background in test construction principles or oppor- 
tunity to run try-outs and knowledge of or opportunity to doitem 
analysis. These tests even when constructed by expert test con- 
structors may have a good deal of ‘dead wood’ in them—that is, 
some items may be too easy and some may be too hard and so not 
discriminate among the group for which the test is intended. 
When they are built by other than test construction experts, how- 
ever, they may be full of flukes, some of which may be caught 
by item analysis and many of which will be recognized only 
after long experience in constructing tests of all sorts. 

Provisional tests are released pending better tests covering 
the same area. They may be recalled at any time. Usually, 
however, when they are built they are expected to be of limited 
circulation and there is no intention of recalling or revising them. 
They usually cover certain specialized information known and 
used by relatively few persons. 

Preliminary tests are early editions of tests which are expected 
to be widely used and which are thus to be tried out and item 
analyzed so that the dead wood may be deleted in order to save 
the time of examinees and administrators. 

Final tests —Tests made from preliminary tests or experi- 
mental tests which have proved valid and which have been 
revised on the basis of item analysis. These tests have content 
validity and it is customary to determine their statistical validity 
also since internal criterion data are always present after admin- 
istration and external criterion data are usually obtained. 

Standardized tests.—Final tests for which norms for particular 
purposes have been set up and which also have been validated for 
particular purposes. 

While a rating or a personal opinion cannot compare with the 
accuracy of measurement by a good test, personal opinion is far 
better than a poor test. Personal opinion is likely to be tem- 
pered. It is also likely to be taken as personal opinion. A test 
score is too often apt to be believed and acted upon and is too 
easily misinterpreted for one to afford to use a score obtained on a 
poor test. 

When it comes to test construction, preferably no test con- 
structor should work alone. In addition to the subject-matter 
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expert, an expert in test construction should go over each test in 
detail, and an expert statistician familiar with the testing field 
should be available as a consultant. The test construction 
expert should be a person with special qualifications. He should 
be one whose interests in persons and things have lead him to 
explore them. It is necessary to know the language in which 
various groups of people express themselves; what things people 
in different levels of society think are funny or vulgar; what the 
prejudices, biases, limitations, special interests, and capabilities 
of various groups may be. He has to have seen or worked in 
many kinds of jobs and have had or be able to get special training 
in the analysis of human abilities and of jobs. He has to be able 
to change his ‘set’ readily and to see each situation from more 
than one and preferably several points of view. Before he 
writes directions for all sorts of people, he must be able to write 
simply and well—to write directions which can be talked, not 
read. He should have administered tests to all sorts of people— 
old and young, literate and illiterate, foreign-born and native- 
born, educated and not educated, deaf and blind, slow and 
superior and normal, defective and non-defective, socialites 
and plodders, sane and psychotic. He should have administered 
not only paper-and-pencil tests, but also performance tests and to 
groups of people as well as individuals so that he can anticipate 
the problems which arise and tell administrators what to expect 
and what to do. 

It is possible to write simple English very well and still to 
completely mislead aliens just beginning to learn English, as those 
who have read the appendix to Mencken’s The American Language 
or who have made friends among immigrants well know. It is 
possible to write an excellent test which is of no use simply 
because while the people for whom it is intended could be trained 
to do the job by oral instruction and by demonstration, they are 
unable to read the test; and those who could read the test and 
pass it would be bored on the job and thus be failures. In sucha 
case, the test is not at fault but the test constructor is. 

The work of the subject-matter expert and of the statistician 
is not the topic under discussion here. The psychological test 
construction expert operates first to see another’s test from a fresh 
point of view if not from an entirely different set. Then he 
checks to see that test construction principles have been carried 
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out as applicable. All this is entirely apart from editing for 
capitals, compounds, numerals, punctuation, etc. The test con- 
struction expert knows the theory of mental tests. He knows the 
difference between tests which look good and tests which have 
content validity. He knows what he wants to measure and 
whom he wants to measure. He knows the difference in the scor- 
ing formulas to be used with untimed power tests, timed power 
tests, and speed tests, and how the use of these formulas affects 
the directions which should be given to the subjects (over and 
above how to do the items and mark the answers). He knows 
when a stop-watch must be used and when a watch with a sweep 
secondhand is sufficient and when neither is necessary. He knows 
the differences which should be apparent in a test built to pass or 
fail people and one built to rank those same people, and the dif- 
ferences in results of a test applied to a small group as opposed 
to a large group. He knows the respective merits of various 
types of form of item. He knows what test scores can logically 
be combined and how to combine them, how tests are interpreted, 
how to set up local and national norms, and when to use each and 
how to interpret them. 

Such an expert will check the coverage of a test, thus operat- 
ing as a check against the judgment of the subject-matter expert. 
The test construction expert will check to see that each lead 
states a problem and tells the point of view from which it is to be 
answered, that so far as possible the items are so formulated that 
they test only one factor, that each item has only one answer, 
that directions tell whether best or right answer is expected, that 
items are functional rather than background, that ‘‘ unnecessary 
technical terms or obscure minutiae” are eliminated, that items 
are not wordy, that no trick or catch items are included, that the 
maximum number of items per minute have been provided, etc. 
He watches to see that “‘subject matter is appropriate, not moral- 
izing nor inviting ridicule nor derogatory to any group of honest 
individuals.”’ He sees that the flavoring is appropriate—that 
roads are not built which engineers would not accept. He refers 
opinionated views to their source, and so on according to the 
body of knowledge regarding test construction which has been 
built up. 

He scrutinizes extremely carefully the devices which require 
the subject to identify and correct errors; he tries to get alterna- 
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tives of items into parallel construction; considers the problems of 
specific determiners and dichotomies, etc. Then, if he has not 
already done so, he works every problem and checks the key, and 
lastly checks to see that items and alternatives have been properly 
randomized and that advantage has been taken of all possible 
ways of reducing test time, and looks to see that the proper num- 
ber of right and wrong responses are required. 

Item analysis is a valuable check on certain aspects of any 
test as applied to the group to which it has been given. It 
indicates whether the test is too hard or too easy, it may show 
that an item has more than one answer or can so be interpreted, 
it may indicate that the scoring key is wrong; but special training 
and experience is required beyond ability to use item analysis. 
It is this knowledge that makes a particular test constructor an 
expert. Almost anybody can write items, be they good or poor— 
not everybody can write or even recognize a poor test. 

It is, then, the test constructor’s responsibility to see that tests 
‘which look good’ are not released to people who might be tempted 
to put them into service use; and that these and tests ‘with 
assumed validity’ are proved statistically valid; or if this cannot 
be proved, are discarded or set aside until such proof can be 
established. It is the duty of the test constructor to train 
individuals whose job it is to write items ‘with content validity’— 
both those hired by test construction concerns and those attend- 
ing schools of education who will be writing classroom tests for 
years to come; and, while in the teaching situation, it might be 
expected that expertness in the field covered by the test may be 
assumed; in any other situation, expert consultants in subject- 
matter areas must be provided. Finally, it is the responsibility 
of the test constructor to provide information regarding proper 
interpretation of both tests ‘with content validity’ and tests ‘with 
statistical validity’ for test administrators. Such information 
must always describe the group on whom the test was standard- 
ized or for whom it is intended, and provide information regarding 
the validation criterion. 





PREDICTING STUDENT PERFORMANCE IN THE 
FIRST COURSE IN PSYCHOLOGY* 


SLATER E. NEWMAN, CARL P. DUNCAN, GRAHAM B. BELL, 
and KENNETH H. BRADT 


Department of Psychology 
Northwestern University 


The preliminary study reported here represents the confluence 
of three disparate types of research—attempts to determine what 
misconceptions and superstitions the current university student 
bodies hold, attempts to predict student performance, and 
attempts to evaluate the effects of a course in psychology. Each 
of these areas has a history. 

One of the earliest efforts to discern the superstitions of the 
college student was reported in 1919 by Conklin? who, over a 
four-year period, had asked students at the University of Oregon 
to report on any superstition they had or had had. He found 
that eighty-two per cent of the group of freshmen had held super- 
stitions at some time or other, and that these superstitions tended 
to persist in spite of ‘education and the development of reason.”’ 
Nixon? in 1925 prepared a list of thirty true-false items (all cor- 
rect if marked ‘false’) which included some superstitions and 
some psychological misconceptions. Garrett and Fisher‘ revised 
Nixon’s list and administered it to one hundred boys and one 
hundred girls in the New York City high-school system, finding 
a more psychologically conventional attitude among the older 
students. A study by Longstaff* compared beliefs of advanced 
psychology students of 1947 with beliefs held by a 1923 advanced 
class and concluded that today’s beginning course is far more 
effective in combatting belief in pseudo-psychology than it was 
two decades ago. Recently, Weitzman’ administered a multi- 
ple-choice type examination to entering students at American 
University and found that the students held many mistaken 





* The results of this paper were discussed in part at the meeting of the 
American Psychological Association, September, 1950. 
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beliefs in several areas, including psychology, biology, economics 
and education. 

Evidence of the second stream of research, attempts to predict 
academic success, are within the acquaintance of all of us. Most 
of the work has been done within the area of general academic 
success, although lately there have been more attempts to predict 
within specific subject matter fields. A notable example within 
the field of psychology is the study of Carlson, Fischer and Young! 
at the University of Illinois, who found that a proficiency-type 
examination given at the beginning of the course could not be 
used to predict the amount of gain a student might be expected 
to achieve in the course. 

The third body of studies concern themselves with the assess- 
ment of changes (specifically, decreases in the number of mis- 
conceptions held) which have occurred as a result of study in the 
first course in psychology. Gilliland® in 1930 found that there 
was a slight relationship between scores on a revised Nixon list 
(he added ten true items) and course grades, and a substantially 
higher correlation between the two at the end of the course. In 
1941, Dysinger and Gregory® reported that statistically reliable 
improvement occurred in answering questions classed as popular, 
semi-popular and technical after the first course, and that previous 
knowledge of psychology and/or level of ability as indicated by 
Nebraska Revision Army Alpha scores, were important deter- 
minants of student psychology grades. 

It was the purpose of our study to determine what type of 
examination might be designed to predict student performance 
in psychology, and to determine whether a test of misconceptions, 
an achievement-type examination, or a more general test of 
scholastic aptitude correlated highest with grades in psychology. 

Students from two of the introductory classes in psychology 
were the subjects. Seventy-four of them were examined on a 
test of misconceptions devised at Northwestern University by 
Buxton.’ The test consists of one hundred true-false questions of 
the following type, 

‘‘ All men are created equal in capacity for achievement.” To 
the other sixty-two students was administered a typical, though 
somewhat more general, achievement examination in psychology, 
which sampled material from each chapter in the course text- 
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book. There were sixty four-alternative multiple-choice state- 
ments. Typical of these is: 

“The child who learns to love and trust his parents and to feel 
secure with them, can be expected to: (a) be free of anxiety the 
rest of his life; (b) be too contented to form affectional bonds with 
outsiders; (c) establish a satisfactory marital relationship later; 
(d) behave in a socially undesirable manner.”’ 

Students were instructed to do their best on these examinations 
and were told that their performance would have no effect on 
their final grades. The scores on the Ohio State Psychological 


TABLE 1.—INTERRELATIONSHIPS AND RELIABILITIES OF THE OHIO 
STATE PsYCHOLOGICAL EXAMINATION, GENERAL ACHIEVEMENT 
EXAMINATION IN PsycHoLoGy, BuxTon’s TEST OF 
MISCONCEPTIONS, AND GRADES ON PsYCHOLOGY 
DEPARTMENTAL EXAMINATIONS 


Ohio vs. Ohio .930 
Ohio vs. Achievement .414 
Ohio vs. Buxton .437 
Ohio vs. Departmentals .309 
Achievement vs. Achievement .570 
Achievement vs. Departmentals .402 
Buxton vs. Buxton .710 
Buxton vs. Departmentals .369 
Departmentals vs. Departmentals .930 


Examination, in terms of percentile of those applying for admis- 
sion to Northwestern, were gathered from the University Testing 
Office. The criterion measure was total points on the depart- 
mental examinations. These examinations were based exclu- 
sively on material to be found in the course textbook. There 
were two mid-terms of sixty-five items each, and one final of 
one hundred thirty questions. All items were of the multiple- 
choice or matching variety. Pearson product-moment coeffi- 
cients of correlation were computed for the reliabilities of the two 
new tests and for the criterion, and for the relationships among 
the four present measures. All reliabilities are split-half, cor- 


rected by the Spearman-Brown formula. The results are listed in 
Table 1. 
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The bulk of these coefficients, although all significantly differ- 
ent from chance, are quite low. Some injustice has been done to 
the Ohio, in that differing lengths of time passed between the 
students’ taking the Ohio and their enrolling in the introductory 
course in psychology. However, an earlier study at North- 
western by Holley® showed the correlation between Ohio and 
course grades to be 0.41 for first-quarter freshmen. 

In their present state, none of these measures seems capable 
of helping the instructor separate the potentially good from the 
potentially poor students in his course. Such a use would 
require a rather high validity for the test used to predict. The 
authors feel that the possibility exists that such an instrument can 
be constructed, and the use of a test of misconceptions, or a test 
of ‘psychological background,’ or a combination of both, might 
result in the desired predictive tool. 

The instrument might then be of use to the instructor in at 
least two ways. First, in locating early those students who might 
need extra aid in his course, and second, if ability to succeed in 
the beginning course in psychology is to some extent a special 
ability, then grades on the test might be utilized as a basis for 
advising some students not to take the course at all. 
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BOOK REVIEWS 


Victor C. Raimy (Ed). Training in Clinical Psychology. New 
York: Prentice-Hall, Inc., 1950, pp. 253. 


Large was the need and scarce the supply of clinical psycholo- 
gists after War II. To satisfy this urgent need many sources of 
training facilities were developed. The need for an over-all 
review of all the training activities which have emerged since the 
post-War II period seemed indicated to many of the executive 
members of the American Psychological Association. Two 
questions the group was interested in answering were: 1) What 
are societies’ needs for the services that psychologists can provide? 
and 2) What kinds of training are needed to produce good clinical 
psychologists? It was with this in mind that the Boulder con- 
ference was called. The conference was fortunate in having its 
expenses underwritten by grants from the United States Public 
Health Service. A total of seventy-one representatives from 
training universities, mental health service agencies and allied 
professions met daily for two weeks in small subgroups and gen- 
eral sessions. A loose parliamentary procedure was followed 
and opportunity was provided for walking along the range from 
total agreement to total disagreement. Approximately seventy 
propositions were found on which there seemed to be rather gen- 
eral concurrence. These resolutions are found scattered through- 
out Training in Clinical Psychology. Dissenting opinions are 
presented in the text along with the discussion of many resolutions 
on which there was little agreement. The spirit as well as the 
content of this conference is what the editor, Dr. Raimy, tried to 
give in selecting and organizing the material for the present 
volume. In his own words, he ‘‘made an attempt to provide a 
more readable organization, one that would better reflect the 
actual spirit and particular emphasis given the topics by con- 
ference discussions.”’ 

The material is organized by the editor into seventeen chap- 
ters. The topics considered include a consideration of back- 
ground and organization of social needs; professional training 
in relation to social needs, including kinds and levels of training; 
ethics; curriculum problems; training problems—training for 
research, for psychotherapy, for field training; problems of 
selection and evaluation of students; staff training and relations 
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with other professions and with government agencies; the problem 
of accreditation of training universities and licensing and certifica- 
tion. The last chapters include problems which would be of 
interest to most people in the field—current issues in the training 
of the clinical psychologist. Issues considered include the prob- 
lem of specialization, problems of private practice, sub-doctoral 
training, post-doctoral training, foreign language requirement, 
the clinical psychologist as scientist and practitioner, the estab- 
lishment of standards for field work, training agencies, codrdina- 
tion in student selection, financing of graduate training, the 
medical orientation of clinical psychology and methods of train- 
ing in ethical practices. 

The volume also includes a well-written introduction by Dr. 
Robert H. Felix, director of the National Institute of Mental 
Health of the United States Public Health Service on the problem 
of Mental Health and Clinical Psychologists. It also includes 
two appendices—the 1947 report of the Committee of Clinical 
Training and a listing of participants in the conference. 

The conference considered four possible levels of training; 
namely, 1) Ph.D. in clinical psychology with supervised post- 
doctoral training; 2) Ph.D. with four years in a graduate school 
of which one year is supervised internship; 3) sub-doctoral train- 
ing of approximately two years with or without the M.A. degree; 
4) the B.A. degree with graduate training. 'The members of the 
Boulder conference agreed on the need for strict maintenance of 
training programs in clinical psychology at the doctoral level and 
recommended that for future entrance into the field the term 
‘Clinical Psychologist’ be reserved for persons who have received 
a doctoral degree based upon graduate education in clinical 
psychology at a recognized university. Also agreed upon was the 
decision that research be given a place of equal and coérdinated 
importance with practice in the education of graduate students 
in the clinical area. 

Reporting the sessions of this conference as presented by Dr. 
Raimy has served and is serving as a frame of reference for 
further considerations of some of these emerging professional 
problems considered here. Dr. Raimy deserves the thanks of all 
clinicians for preparing the volume in as readable and useful a 
form to serve such purposes. H. MELTZER 

Psychological Service Center 

St. Louis, Missouri 
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Mauve B. Muse. Guiding Learning Experience. New York: 
The Macmillan Co., 1950, pp. 617. 


Early schools of nursing operated from a line of authority basis. 
The autocratic and militaristic administration of such schools was 
matched by an equally autocratic curriculum for the nurses in 
training. Although modern schools of nursing have been as 
influenced by philosophical and psychological writings as have 
other professional schools, there are still practices in these (again 
as in other sorts of professional schools) that are a legacy from 
the past and that are contra-indicated by research findings and 
acceptable psychological and educational theory. 

Subtitled Principles of Progressive Education Applied to Nurs- 
ing Education, this volume should help dislodge such discredited 
practices by aiding nursing educators to develop consistent view- 
points based upon empirical data and modern theory and, there- 
fore, to instruct with greater effectiveness. Although it is 
designed with classroom and clinical instructors in nursing schools 
at the primary focus, it is not a source of detailed teaching meth- 
ods in nursing education. It is rather a volume that deals 
primarily on the theoretical level with principles of group 
dynamics, teaching, and learning. Reference is constantly made 
to the work of the nursing instructor, and the latter sections of the 
book deal specifically with the applications of principles to 
practical situations. However, the emphasis throughout is upon 
principles and there is an almost too determined effort to expose 
the theoretical constructs underlying general teaching procedure. 
This is not to say that the rigor with which the author insists on 
internal consistency of theory and practice is objectionable. 
On the contrary, the chief value of the book lies in this very 
rigor. But the author has a tendency to obscure the account 
of the development of sound practice by repeated assurance to the 
reader that this or that practice is in consonance with field theory 
or progressive education. 

It is also regrettable that the author uses the rubric ‘Progressive 
education’ so generously. This is a book written for nursing 
educators that ranges from a consideration of philosophical and 
psychological foundations to principles of unit organization and 
teaching methods. Miss Muse has not restricted herself to a 
narrow account of Progressivism as a distinct movement in 
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education. Actually her book is far more significant than this. 
It is neither restricted to Progressive education, whatever that 
may mean these days, nor, on the other hand, is it an eclectic 
ypproach in which the advantages and disadvantages of systems 
are considered. It is an extensive account of field theory sup- 
porting broad aspects of Dewey’s educational philosophy. 
Various theoretical systems and constructs within this framework 
are assayed for pertinence and applicability to the work of the 
nursing educator. Thus the repeated use of the term ‘Progressive 
education’ in one form or another is intrusive and rightly annoying. 

The very comprehensiveness of the volume may be a weakness. 
There is too little space given, at many points, to an adequate 
development of concepts. Parts of the book have an almost 
breathless quality. Student-teachers already at home with the 
systems of thought that are developed would probably find a 
straight-forward analysis of the applications of these systems to 
nursing education more profitable. Students who are untrained 
in these systems of thought will surely find the pace and coverage 
bewildering. Here are the elements of sociological and psycho- 
logical foundations in education presented in one broad sweep 
together with their applications to the specialized field of nursing 
education. This is a formidable load for any volume to bear. 
Although the author has done a commendable job in sketching 
this broad field and in pegging it at salient points, it seems highly 
probable that most readers, except those for whom the book serves 
as review, will run the danger of verbalistic learnings without 
conceptual substance and insight. 

Despite its pace and coverage the book is not superficial. 
Miss Muse has done an important and, it would seem, a quite 
difficult job in tying so many fundamental concepts of teaching 
and learning to the work of the professional nurse educator. The 
book is heavily footnoted with references to a wide range of 
experimental studies and basic writings on theory and practice. 
But it opens new doors and jerks the reader onward before his 
eyes have become adjusted to the rooms he has sped through. 

The Baedeker quality of the writing can be seen by reference 
to almost any section of the book. In a chapter entitled ‘‘Two 
Psychological Systems Contribute Pedagogical Principles”’ 
Miss Muse deals with three contemporary interpretations of field 
theory psychology. She writes: 
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“‘Currently there are in this country three interpretations of 
field theory psychology: (1) Gestalt psychology; (2) so-called 
‘organismic psychology’; and (3) topological psychology. The 
difference in interpretation lies in what is taken as a point of 
reference; in Gestalt psychology it is perceptions; in organismic 
psychology, the human organism; and in topological psychology, 
individual and group relationships. 

“Gestalt Psychology.—The first branch of field theory psy- 
chology to be developed was Gestalt psychology. It originated 
in Germany under Wertheimer, Kohler, and Koffka and reached 
this country after World War I. Gestalt psychology made 
relatively little impression here at first, which has been attributed 
to the vagueness and obscurity of the German scientific phraseol- 
ogy (27, Preface). Outstanding exponents of Gestalt psychology 
in America have been Wheeler (76,77) and more recently Hart- 
man (27,29,54); the teaching and writings of many other out- 
standing psychologists and Progressivists reveal that they are 
utilizing concepts of two or three of the current interpretations of 
field theory psychology, even though they may make no claims of 
being field theorists. 

“Organismic Psychology.—As might be anticipated, organismic 
psychology is based on the modern biological concept of the 
human organism which is seen to develop and to operate accord- 
ing to field theory laws. It finds support in the theories of 
Twentieth Century biologists and neurologists, including Child, 
Coghill, Cannon, Lashley, Goldstein, Woodger, and others (54, 
p. 173). For these reasons organismic psychology may have 
greater appeal for nurses than the other interpretations. 

“Topological Psychology—The third interpretation of field 
theory psychology, topological psychology, is presented in the 
Forty-first Yearbook of the National Society for the Study of 
Education (54) under the title of field theory psychology. It has 
been developed by Kurt Lewin and his associates since the middle 
1920’s, (44,45). Topological psychology finds its wholes in 
individual and group relationships (11, p. 150) and makes use of 
terms from mathematics and physics. Drawings or diagrams are 
used to clarify its concepts (p. 264). Topological psychology is 
responsible for use of the terms ‘vector’ and ‘valence’ in current 
books on social psychology and problems of education (59, 
pp. 268-272). Lewin’s concept of action is considered later in 


this chapter.”’ 
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The author is not done with these systems after so brief an 
introduction. As stated earlier, the entire book is stringently 
rigorous in its adherence to field theory psychology. But the 
foregoing excerpt is indicative of the pace, the language, and the 
monographic citing so replete in the entire book. 

The volume is organized into four units. The first develops a 
philosophical frame of reference. Dispensing quickly but quite 
fairly with traditional education and ‘‘so-called ‘free-school’ 
education, an early (but lingering) misinterpretation of the Dewey 
philosophy,” Miss Muse develops the concepts of pragmatism 
and offers an excellent overview of the characteristics of Progres- 
sive education. 

Unit II further develops the concepts of experimentalism and 
the principles that underlie educative experience as seen by the 
pragmatists. Principles of motivation and integration are well 
developed and lead to an analysis of configurationism, connection- 
ism, and various interpretations and applications of field theory 
psychology. 

Unit III considers the problems of modifying and developing 
teaching practice to bring it in line with acceptable educational 
philosophy and modern concepts of learning theory. The final 
unit consists of two chapters devoted to principles of organization 
of learning materials and activities. 

If nursing educators in training can be stimulated to study this 
volume, to engage in a serious checking of a reasonable amount of 
the references cited, and to cogitate the potential meanings 
developed, they should profit greatly. Despite its focus on 
nursing education, the average doctoral candidate in any phase of 
education would do well to read the volume with care. Despite 
the over-use of certain rubrics, this is a sound book. A great 

deal is packed within one binding. But it is rewarding reading 
for students in education who have the percipience to take its pace 
and the perseverence to think on its substance. 
R. Witt BurRNETT 

University of Illinois. 


MartHa May Reynoutps. Children from Seed to Sapling. 
Second Edition. New York: McGraw-Hill Book Co., 1951, 
pp. 334. $3.75. 


The study of children is not mere casual observation, nor does it 
need elaborate equipment and techniques. The former may 
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afford amusement but not understanding; the latter is necessary 
for scientific investigation. However, there is a middle ground 
which leads to valuable understanding for those who work with 
children, such as teachers, clinical psychologists, social workers, 
or parents. While these specialists need an appreciation of 
children and their problems they cannot be expected to become 
scientific child psychologists. It is for this middle group that this 
book has been written. 

Throughout the book there is emphasis on learning to observe 
and study children, not on children themselves. However, 
adequate observation presupposes sufficient background so that 
the observer sees behaving children with some insight. There- 
fore, characteristics of physical condition, growth, education, 
ability, socialization and so on are presented in summary for 
various age levels. Chapters are devoted to age groups divided 
into babyhood (1-2), pre-school (2-4), early childhood (5-7), 
transition eight-year-olds, later childhood (9-11), early adoles- 
cents (12—14), and older adolescents (15-17). These divisions are 
on a basis of convenience particularly in relation to the possible 
sources of children for observation. 

The book is written in an informal stimulating style. Each 
paragraph asks questions or makes suggestions helpful to intel- 
ligent observation. Children are people—while they may be 
studied in groups it is as individual personalities that they must be 
understood. As a textbook for college students or as a study 
guide for parents and other older persons interested in children 
this volume should prove of real value. C. M. Louttitr 

University of Illinois 


Rut Strang. An Introduction to Child Study. Third Edition. 
New York: The Macmillan Company, 1951, pp. 705. $4.75. 


The teacher who is interested in a systematic, theoretically 
integrated text will probably not want to use this third edition of 
Professor Strang’s text. On the other hand, it may well find wide 
student acceptance as a text designed to cover the whole field with 
maximal assistance to the student. The volume is very well 
organized and set up for note-taking ease. The five sections, 
each dealing with a separate age group, are divided into chapters 
which are broken down into paragraph-length subheadings. 
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Each section is followed by a summary, a list of questions and 
problems, an unusually up-to-date and comprehensive bibliog- 
raphy, and a list of films. Four of the sections have chapters 
which briefly describe techniques for studying children of the age 
group discussed in the section. 

Despite these virtues, however, this reviewer found it to be an 
unexciting volume. In the author’s effort to include every point 
of view, none of them seems adequately presented. In her effort 
to produce a readable and interesting volume, Professor Strang’s 
style struck the reviewer as becoming platitudinous, patronizing, 
and repetitious. The thirteen pictures have captions only 
vaguely related to their content, and they do not integrate into 
the text. While the bibliography is impressive, only a very small 
proportion of the items are mentioned in the text. The few that 
are, are frequently only casual references. While there are 994 
different names in the index of names, only twenty-five are 
mentioned more than once in the text, and most of the index 
references are to the section bibliographies. 

Professor Strang does include a great many examples of 
children’s behavior and a lot of very interesting case material. 
Despite the style, the book does communicate a great deal more 
warmth and understanding of children than do many of our 
more erudtie texts. In terms of content, this reader found little 
to take exception to, except the feelings expressed above, and the 
impression that frequently concepts were unnecessarily over 
simplified with a resultant loss in precision. IRVING LAZAR 

University of Illinois 


Oscar K. Buros, Editor. Statistical Methodology Reviews, 1941- 
1950. New York: John Wiley and Sons, Inc., 1951, pp. 457. 


In the past decade there have appeared three hundred forty 
books which Buros finds significant enough to list in this third 
compilation of book reviews. About two-thirds are represented 
by reviews rather than mere listings. A small computation 
shows that on the average two large double-spaced pages of 
review are provided for each of these books. The reviews are 
quoted from periodical sources. 

The books cover the whole wide range of contemporary statisti- 
cal interest, from introductory manuals for applied courses to 
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advanced treatises in mathematical statistics. Opening at 
random, one finds a review of a 1947 reprinting of an obsolete 
monograph, a text on mathematical statistics published in India, 
a monograph showing how chi-square can be used in anthropology, 
and Churchman on experimental inference. Reviews in educa- 
tion and psychology deal with books by Edwards, Garrett, 
Guilford, Palmer Johnson, and the like. 

A reviewer must question whether this collection has enough 
value to warrant the care Buros and his publisher have expended 
on it. The frequency with which the book will be consulted by 
any one reader seems small. But, the volume having been 
produced, everyone interested in statistics will enjoy an examina- 
tion of it. The collection gives a sense of trends in statistics, 
and calls attention to books that one wishes to know more about. 
A sizable minority of the reviews are important critical papers. 

Of sufficient interest to mention in particular here are Thom- 
son’s comment comparing Finney’s probit analysis to the Con- 
stant Process (p. 129), Kelley and Hotelling on factor analysis 
and the error distribution of factor loadings (p. 191-195), Tukey 
on the logic of factor analysis (very good) (p. 363), and the inter- 
change between Deming and Peatman on sampling and polling 
(p. 307-310). It is possible that fifteen reviews for Kelley’s 
Fundamentals, for example, passes the point of diminishing 
returns, but the ‘symposium’ shows a fascinating range of opinion. 
There is a good deal of criticism, but each reviewer attacks dif- 
ferent points and often points the man on another page endorsed. 
This sort of lesson needs to be brought home to writers of books 
and of reviews, to teachers of statistics, and to advanced students. 

LEE J. CRONBACH 


University of Illinois 
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